100 Python Interview Questions Guide
100 Python Interview Questions Guide
Interview Questions
And Answers
APOORVA IYER
Python Basics And Core Concepts
|1 |Page
6. What is the purpose of the with statement in Python?
The with statement is used for resource management. It ensures that files or resources are properly
closed or released after use, even if an error occurs.
Example:
with open("[Link]", "r") as f:
data = [Link]()
8. What is the difference between shallow copy and deep copy in Python?
Shallow Copy: Copies the outer object but not nested objects.
Deep Copy: Copies all objects recursively.
Use the copy module:
import copy
[Link](obj) # Shallow
[Link](obj) # Deep
|2 |Page
x = "Hello"
print(str(x)) # Hello
print(repr(x)) # 'Hello'
|3 |Page
Data Analysis With Pandas And NumPy
12. How do you read a CSV file into a DataFrame using Pandas?
Use the read_csv() function from Pandas to load CSV data:
import pandas as pd
df = pd.read_csv('[Link]')
|4 |Page
16. How do you group data in Pandas and perform aggregation?
Use groupby() to split the data and apply aggregation functions:
[Link]('category')['sales'].sum()
19. How do you handle categorical variables in Python for data analysis?
Label Encoding: Use LabelEncoder to convert categories into integers
One-Hot Encoding: Use pd.get_dummies() to create binary columns
Frequency or Target Encoding: Replace with aggregated numerical values
|5 |Page
df[(df['age'] > 30) & (df['income'] > 50000)]
24. What is the difference between .values, .to_NumPy(), .tolist(), and .array in Pandas?
.values: Older way to get NumPy array
.to_NumPy(): Preferred way to convert to NumPy
.tolist(): Converts to Python list
.array: Returns internal ExtensionArray
30. What is the difference between merge(), join(), and concat() in Pandas?
merge(): SQL-style joins on columns
join(): Join on index
concat(): Stack DataFrames vertically or horizontally
|6 |Page
NumPy, Statistics And Math Operations
a = [Link]([[1, 2, 3],
[4, 5, 6]])
|7 |Page
b = [Link]([10, 20, 30])
result = a + b
print(result)
Output:
[[11 22 33]
[14 25 36]]
In this example, array b is automatically broadcasted to match the shape of array a. The values
from b are applied row-wise to a without explicitly repeating b.
How broadcasting works:
1. Compare the shapes of both arrays from right to left.
2. Dimensions are compatible when:
o They are equal, or
o One of them is 1.
3. NumPy then "stretches" the smaller shape to match the larger one.
Benefits of broadcasting:
Avoids memory duplication.
Increases code readability and conciseness.
Boosts computation speed for large datasets.
Common use cases:
Adding a scalar or 1D array to a matrix.
Normalizing rows or columns in datasets.
Element-wise operations on mismatched array shapes.
Understanding broadcasting is critical in numerical computing because many high-performance
computations rely on this concept to work efficiently.
33. How do you calculate the correlation between two variables in pandas?
In data analysis, correlation measures the degree to which two variables move in relation to each
other. The most commonly used type is Pearson correlation, which captures linear relationships.
Pandas provides a built-in method called .corr() to calculate correlation coefficients between
columns of a DataFrame or between two Series.
Example:
|8 |Page
import pandas as pd
df = [Link]({
'height': [160, 165, 170, 175, 180],
'weight': [55, 60, 65, 70, 75]
})
correlation = df['height'].corr(df['weight'])
print(correlation)
Output:
1.0
This output indicates a perfect positive linear correlation.
Types of correlation supported by pandas:
method='pearson' (default): measures linear correlation.
method='spearman': measures rank-based correlation (monotonic).
method='kendall': measures ordinal association.
Why this is useful:
Identifying strong or weak relationships between variables.
Detecting multicollinearity in regression analysis.
Selecting features based on correlation with the target variable.
Important Note: Correlation does not imply causation. Two variables can be correlated due to
confounding factors, so additional analysis is often needed.
|9 |Page
Example:
import NumPy as np
x = [Link]([1, 2, 3])
y = [Link](x)
| 10 | P a g e
Use embeddings for deep learning or NLP problems.
📌 Pro Tip: Always validate encodings with cross-validation to ensure they improve model
performance.
| 11 | P a g e
3. [Link]([Link](), annot=True)
4. Variance Inflation Factor (VIF)
VIF quantifies how much the variance of a coefficient is inflated due to multicollinearity.
5. from [Link].outliers_influence import variance_inflation_factor
6. vif_data = [Link]()
7. vif_data['VIF'] = [variance_inflation_factor([Link], i) for i in range([Link][1])]
How to handle it:
Remove one of the correlated variables: Keep the one with higher relevance.
Combine features: Use PCA or feature engineering to combine highly correlated
variables.
Regularization: Use Ridge or Lasso regression to penalize large coefficients and reduce
multicollinearity.
📌 Multicollinearity doesn’t affect the predictive power of the model drastically but makes
interpretation unreliable.
38. What is the difference between any() and all() functions in Python?
Both any() and all() are built-in functions that work with iterables (lists, tuples, sets, etc.), typically
used in conditions and filters.
Function Description Example Result
any() Returns True if at least one element is True any([False, True, False]) True
all() Returns True if all elements are True all([True, True, True]) True
Use cases:
any() is often used when checking if a list contains at least one True, non-zero, or valid
value.
all() is used for validating that all inputs or conditions meet a requirement.
Example in data filtering:
df = [Link]({'A': [1, 0, 3], 'B': [4, 5, 0]})
df['has_zero'] = df[['A', 'B']].apply(lambda x: not all(x), axis=1)
This adds a new column that flags rows with zero values.
| 12 | P a g e
matching rows from the right DataFrame. If no match is found in the right DataFrame, the result
will contain NaN.
Syntax:
[Link](left_df, right_df, how='left', on='key_column')
Example:
df1 = [Link]({'id': [1, 2, 3], 'name': ['A', 'B', 'C']})
df2 = [Link]({'id': [2, 3], 'score': [85, 90]})
📌 Left joins are commonly used in real-world datasets where one table has master records and
another has related transactional or lookup information.
| 13 | P a g e
Iterating over multiple lists simultaneously.
Unzipping data using zip(*zipped_data)
Unzipping Example:
zipped = [('a', 1), ('b', 2)]
letters, numbers = zip(*zipped)
Output:
letters = ('a', 'b')
numbers = (1, 2)
📌 zip() is memory efficient as it returns an iterator, and it’s widely used in data transformation,
pairing, and looping patterns.
41. How do you calculate the correlation between two variables in pandas?
Correlation measures the linear relationship between two variables. In pandas, you can calculate
it using the .corr() method, which by default uses the Pearson correlation coefficient.
Syntax:
df['column1'].corr(df['column2'])
Example:
import pandas as pd
df = [Link]({
'height': [150, 160, 170, 180, 190],
'weight': [50, 55, 65, 70, 80]
})
correlation = df['height'].corr(df['weight'])
print(correlation) # Output: 0.991 (strong positive correlation)
Types of correlation:
Pearson: Measures linear correlation (default).
Kendall and Spearman: Use .corr(method='kendall') or .corr(method='spearman') for non-
parametric data.
When to use:
Use Pearson when data is normally distributed and linear.
Use Spearman/Kendall for ranked or non-linear data.
| 14 | P a g e
📌 A value near 1 indicates strong positive correlation; near -1 indicates strong negative
correlation; near 0 indicates no correlation.
Example:
import NumPy as np
a = [Link]([1, 2, 3])
b = [Link](a)
| 15 | P a g e
]
Read into pandas:
import pandas as pd
df = pd.read_json('[Link]')
Output:
name age
0 Alice 25
1 Bob 30
Advanced Options:
If your JSON file is nested, you can normalize it using json_normalize() from
pandas.json_normalize().
If reading from an API, use pd.read_json(url).
📌 Make sure the JSON file is properly formatted—pandas expects an array of records or a dict of
dicts.
avg_salary = [Link]('department')['salary'].mean()
Output:
department
HR 35000.0
IT 55000.0
Functions used with groupby:
.sum(), .mean(), .count(), .max(), .min()
| 16 | P a g e
.agg({'col1': 'sum', 'col2': 'mean'}) for multiple aggregations
.transform() to apply a function and keep the original DataFrame shape
Use cases:
Calculating KPIs per region or category
Customer segmentation
Time series analysis by day/month/year
📌 Groupby follows a split-apply-combine pattern: it splits the data into groups, applies a
function, and combines the results.
df_clean = df.drop_duplicates()
Output:
id name
0 1 Alice
1 2 Bob
3 3 Charlie
📌 Always check for and handle duplicates before performing aggregations or model training.
| 17 | P a g e
46. What is the difference between isnull() and notnull() in pandas?
These are pandas functions used to identify missing data in a DataFrame or Series.
Function Description
isnull() Returns True for missing values (NaN)
notnull() Returns True for non-missing values
Example:
df = [Link]({'A': [1, None, 3]})
df['A'].isnull()
Output:
0 False
1 True
2 False
Typical uses:
Filtering missing values: df[df['A'].isnull()]
Counting nulls: [Link]().sum()
Filling nulls: [Link](0)
📌 Proper handling of missing data is a critical part of data cleaning and preprocessing.
| 18 | P a g e
1 2 4 6
Use cases:
Row-wise calculations (e.g., total price, net income)
Applying complex logic that depends on multiple columns
📌 .apply() is flexible but slower for large datasets; vectorized operations are faster when
available.
@decorator
def greet():
print("Hello")
greet()
Output:
Before function call
Hello
After function call
Use cases:
Logging
Timing
Caching
Access control and authentication
Built-in decorators:
| 19 | P a g e
@staticmethod
@classmethod
@property
@lru_cache (for memoization)
📌 Decorators follow the DRY (Don't Repeat Yourself) principle and are heavily used in Flask,
Django, and FastAPI.
| 20 | P a g e
Core Python Programming
| 21 | P a g e
class Child(Parent):
def greet(self):
super().greet() # Calls the parent method
print("Hello from Child")
child = Child()
[Link]()
# Output:
# Hello from Parent
# Hello from Child
# With condition
evens = [x for x in range(10) if x % 2 == 0] # [0, 2, 4, 6, 8]
| 22 | P a g e
Custom context manager using class:
class MyContext:
def __enter__(self):
print("Entering context")
return self
def __exit__(self, exc_type, exc_val, exc_tb):
print("Exiting context")
with MyContext():
print("Inside context")
| 23 | P a g e
Example:
def countdown(n):
while n > 0:
yield n
n -= 1
56. What are Python data classes and when should you use them?
Data classes, introduced in Python 3.7 via the dataclasses module, provide a simple way to define
classes that are mainly used to store data.
With the @dataclass decorator, Python automatically generates special methods like __init__(),
__repr__(), __eq__() based on the class attributes. This saves time and reduces boilerplate code.
Use Cases:
When you need lightweight classes to hold attributes (e.g., records, configuration objects)
| 24 | P a g e
When you want default values, type hints, and immutability features
Example:
from dataclasses import dataclass
@dataclass
class Employee:
name: str
age: int
department: str = "Data"
def __str__(self):
return f'Book: {[Link]}'
| 25 | P a g e
book = Book("Python Basics")
print(book)
# Output: Book: Python Basics
Why interviewers ask this:
To assess your knowledge of Python’s object model and how well you can customize class
behavior.
| 26 | P a g e
Manages memory by discarding old results
Example:
from functools import lru_cache
@lru_cache(maxsize=128)
def fib(n):
if n < 2:
return n
return fib(n-1) + fib(n-2)
60. What are the most important data types in Python for analysts?
Python offers several built-in and library-supported data types that are crucial for data analysts:
int, float: For numerical calculations
str: For handling text data
list: Mutable ordered sequences
tuple: Immutable ordered sequences
set: Unordered collection of unique elements
dict: Key-value mappings
bool: True/False logic
DataFrame (from pandas): 2D tabular data structure, most commonly used in data analysis
Series (from pandas): 1D labeled array
Why this matters in interviews:
Demonstrating fluency with data types shows you're prepared to clean, transform, and model real-
world data using Python.
| 27 | P a g e
Syntax:
df.to_dict(orient='records')
Explanation of orient parameter options:
'dict': {column -> {index -> value}}
'list': {column -> [values]}
'records': [{column -> value}, …]
'series': {column -> Series}
'split': {‘index’ -> [...], ‘columns’ -> [...], ‘data’ -> [...]}
'index': {index -> {column -> value}}
Example:
import pandas as pd
df = [Link]({
'Name': ['Alice', 'Bob'],
'Age': [25, 30]
})
print(df.to_dict(orient='records'))
# Output: [{'Name': 'Alice', 'Age': 25}, {'Name': 'Bob', 'Age': 30}]
Why this matters:
This operation is often used when integrating pandas with web applications, databases, or APIs,
where JSON or dictionary structures are required.
| 28 | P a g e
import pandas as pd
df = [Link]({
'Sales': [100, 150, 200, 250, 300]
})
print([Link]())
Output:
Sales
count 5.000000
mean 200.000000
std 79.056942
min 100.000000
25% 150.000000
50% 200.000000
75% 250.000000
max 300.000000
Other useful functions:
[Link](), [Link](), [Link](), [Link](), [Link](), [Link]()
Interview Tip:
Mention that you also use .value_counts() for categorical features and .corr() to examine
relationships between variables.
| 29 | P a g e
# Sort by age in descending order
df_sorted = df.sort_values(by='Age', ascending=False)
print(df_sorted)
Output:
Name Age
1 Bob 30
0 Alice 25
2 Charlie 22
Interview Insight:
Sorting is often used during reporting, ranking, or preparing data for visual dashboards. Mention
how you combine sorting with filtering and aggregation in real-world scenarios.
| 30 | P a g e
text = "Python is great"
# Min-Max
minmax = MinMaxScaler()
print(minmax.fit_transform(data))
# Standardization
standard = StandardScaler()
print(standard.fit_transform(data))
When to use what:
Use MinMaxScaler when the data is bounded.
Use StandardScaler when your model assumes data is normally distributed (e.g., logistic
regression).
| 31 | P a g e
Use RobustScaler if your data contains outliers.
class Dog(Animal):
def speak(self):
print("Dog barks")
d = Dog()
[Link]() # Output: Dog barks
Interview Tip:
Clarify that overloading can be mimicked using flexible parameters, but overriding is fully
supported in Python and used for polymorphism.
| 32 | P a g e
result = 10 / 0
except ZeroDivisionError:
print("Cannot divide by zero")
finally:
print("Always runs")
Other common exceptions:
ValueError
TypeError
FileNotFoundError
KeyError
Best practices:
Always catch specific exceptions.
Use finally to release resources.
Avoid catching all exceptions unless necessary.
[Link](greet())
When to use:
Use coroutines when handling I/O-bound tasks like web scraping, APIs, or file operations to
improve performance without multi-threading.
| 33 | P a g e
Example:
class Employee:
__slots__ = ['name', 'age']
def __init__(self, name, age):
[Link] = name
[Link] = age
Benefits:
Reduces memory usage.
Makes attribute access faster.
Prevents accidental creation of new attributes.
Limitation:
You can’t add attributes not listed in __slots__. Also, it doesn’t work well with inheritance unless
carefully managed.
71. How does Python handle memory management and garbage collection?
Python uses automatic memory management and garbage collection to efficiently allocate and
reclaim memory during program execution. This process ensures that memory no longer in use is
freed up, preventing memory leaks and optimizing performance.
Key components of memory management:
Reference Counting:
Every object in Python has an internal counter that tracks how many references point to it.
When this count reaches zero, the memory is immediately reclaimed.
Example:
a = [1, 2, 3] # reference count = 1
b=a # reference count = 2
del a # reference count = 1
del b # reference count = 0 → memory freed
Garbage Collector (GC):
Python’s gc module handles cyclic references, which reference counting alone cannot
resolve.
Cyclic garbage collection occurs periodically and identifies unreachable objects with
reference cycles.
import gc
[Link]() # Manually triggers garbage collection
| 34 | P a g e
Memory Pools (pymalloc):
CPython uses an internal allocator for small memory blocks to speed up performance.
Best Practices for Interview:
Use del to explicitly remove large objects when no longer needed.
Avoid circular references when possible.
Use weak references (weakref) for caching or observer patterns where you don’t want to
increase the reference count.
📌 Summary: Python uses a combination of reference counting and cyclic garbage collection,
backed by memory pooling, to handle memory efficiently.
72. What is the Global Interpreter Lock (GIL), and how does it affect multithreading in
Python?
The Global Interpreter Lock (GIL) is a mutex (mutual exclusion lock) in CPython, the standard
implementation of Python. It ensures that only one thread executes Python bytecode at a time,
even on multi-core processors.
Why GIL exists:
Simplifies memory management by preventing race conditions in object access.
Ensures thread safety for core Python internals.
Implications:
For I/O-bound tasks (file operations, API requests), multithreading works fine because
threads release the GIL during I/O.
For CPU-bound tasks, GIL becomes a bottleneck as threads cannot run truly in parallel.
Alternatives to overcome GIL:
Use the multiprocessing module instead of threading for CPU-bound tasks.
from multiprocessing import Pool
Use libraries like NumPy or C-extensions that internally release the GIL during
computation.
Use asyncio for I/O-bound concurrency.
📌 Interview Tip: Explain that GIL limits multi-threaded parallelism for CPU-heavy tasks, and
Python offers workarounds like multiprocessing for true parallelism.
| 35 | P a g e
Module:
A module is simply a Python .py file containing functions, classes, or variables that can be
imported and used in other programs.
Example:
# file: math_utils.py
def add(a, b):
return a + b
Package:
A package is a directory containing multiple modules and an __init__.py file (can be
empty or contain initialization code). It allows for a hierarchical structure.
analytics/
├── __init__.py
├── [Link]
└── [Link]
Importing:
from [Link] import train_model
📌 Key Difference: A module is a single file, while a package is a folder of modules with
optional sub-packages.
| 36 | P a g e
7. import importlib
8. module = importlib.import_module('module_b')
📌 Interview Tip: Demonstrate your awareness of software design and how circular imports
often indicate coupling that needs refactoring.
class MyClass(metaclass=Meta):
pass
When MyClass is defined, the Meta.__new__ method runs first.
Best Practice:
Use metaclasses only when absolutely necessary, as they add complexity. More common
alternatives include class decorators and factory functions.
📌 Interview Edge: Knowing metaclasses shows advanced Python expertise, but also mention
when not to use them.
| 37 | P a g e
Mocking behavior during unit testing
Example:
import math
[Link] = lambda x: "patched!"
print([Link](4)) # Output: patched!
Here, the built-in [Link]() is replaced at runtime with a custom lambda function.
Risks:
Can lead to unpredictable behavior
Makes debugging difficult
Can break code silently if the patch fails
Safer alternatives:
Use inheritance or composition
Use mocking frameworks like [Link] for temporary patches
📌 Interview Tip: While monkey patching shows flexibility, always mention that it should be
avoided in production unless absolutely necessary.
| 38 | P a g e
Avoid holding on to large objects unnecessarily.
📌 Summary: Python’s memory model is abstracted from the developer, but understanding it
helps write memory-efficient and high-performance code.
class MyClass:
pass
obj = MyClass()
weak_obj = [Link](obj)
print(weak_obj()) # <__main__.MyClass object at ...>
del obj
print(weak_obj()) # None (object has been garbage collected)
When to use:
Large graph-like structures (e.g., trees)
Applications with caching requirements
Avoiding reference cycles in design patterns
📌 Interview Edge: Understanding weak references shows you can manage memory in large or
long-running systems efficiently.
| 39 | P a g e
1. Memoization:
Store intermediate results to avoid redundant computation.
o Use functools.lru_cache:
2. from functools import lru_cache
3.
4. @lru_cache(maxsize=None)
5. def fib(n):
6. if n < 2:
7. return n
8. return fib(n-1) + fib(n-2)
9. Convert to Iterative Approach:
Tail recursion is not optimized in Python, so rewriting as a loop avoids deep stacks.
10. Manual Memoization:
Use a dictionary to cache results.
11. Adjust Recursion Limit:
You can increase the recursion limit, but it's risky:
12. import sys
13. [Link](2000)
📌 Best Practice: Use memoization when recursion is necessary and prefer iteration if
performance is critical.
| 40 | P a g e
__slots__ = ['name', 'age']
Use Tuples instead of Lists:
Tuples are immutable and require less memory.
Release Memory Manually:
Use del to remove references, especially for large objects.
Use Efficient Libraries:
NumPy arrays are more memory-efficient than native Python lists for numeric data.
Monitor Memory:
Use gc, tracemalloc, and memory_profiler to detect memory leaks.
📌 Interview Tip: Memory management is not just about saving space—it directly affects
performance and scalability.
| 41 | P a g e
Data Structures And Algorithms In Python
def pop(self):
if not self.is_empty():
return [Link]()
return None
def peek(self):
if not self.is_empty():
return [Link][-1]
return None
def is_empty(self):
return len([Link]) == 0
This implementation allows you to:
push() to add an item
pop() to remove and return the top item
peek() to view the top item without removing it
is_empty() to check if the stack is empty
This structure is commonly asked about in interviews, especially when evaluating your
understanding of basic data structures.
| 42 | P a g e
82. How do you implement a queue using two stacks in Python?
A queue is a linear data structure that follows the First-In-First-Out (FIFO) principle. It can be
implemented using two stacks in such a way that enqueuing is done in one stack and dequeuing is
managed via another. This method is useful for interview problems testing your understanding of
how data structures can simulate others.
Example:
class QueueUsingStacks:
def __init__(self):
self.in_stack = []
self.out_stack = []
def dequeue(self):
if not self.out_stack:
while self.in_stack:
self.out_stack.append(self.in_stack.pop())
if self.out_stack:
return self.out_stack.pop()
return None
This implementation ensures that each element is transferred at most twice, resulting in amortized
O(1) time for each operation. It's an elegant approach to build a queue when only stack operations
are available.
83. How do you reverse a list without using slicing or built-in functions?
Reversing a list manually is a good way to test your understanding of loops and index
manipulation. You can iterate from the end of the list to the start and construct a new list:
Example:
def reverse_list(lst):
reversed_lst = []
for i in range(len(lst) - 1, -1, -1):
reversed_lst.append(lst[i])
return reversed_lst
| 43 | P a g e
This method avoids Python's slicing ([::-1]) or the reversed() function and is often used in
interviews to see if candidates understand control flow and memory allocation.
test()
These functions are very useful in debugging, dynamic code execution (like with eval()), and in
writing meta-programming logic.
| 44 | P a g e
Here’s a simple implementation using string comparison:
def is_palindrome(s):
return s == s[::-1]
Alternative method using a loop:
def is_palindrome(s):
left, right = 0, len(s) - 1
while left < right:
if s[left] != s[right]:
return False
left += 1
right -= 1
return True
This helps evaluate string handling and basic loop control knowledge.
# Unzipping
a_unzip, b_unzip = zip(*zipped)
print(list(a_unzip)) # [1, 2, 3]
print(list(b_unzip)) # ['x', 'y', 'z']
zip() is widely used in iteration, dictionary construction, and parallel processing of lists.
88. What is the difference between break and continue in Python loops?
break is used to exit a loop prematurely when a condition is met.
continue skips the current iteration and moves to the next one.
| 45 | P a g e
Example:
for i in range(5):
if i == 3:
break
print(i) # Output: 0, 1, 2
for i in range(5):
if i == 3:
continue
print(i) # Output: 0, 1, 2, 4
Understanding these control flow keywords is crucial in handling loop-based logic efficiently.
@lru_cache(maxsize=None)
def add(a, b):
return a + b
lru_cache automatically manages the cache and is very helpful in recursive functions or frequently
repeated operations.
| 46 | P a g e
Example:
def flatten(lst):
flat = []
for item in lst:
if isinstance(item, list):
[Link](flatten(item))
else:
[Link](item)
return flat
| 47 | P a g e
Functions, Decorators, And Design Patterns
@decorator
def greet():
print("Hello!")
greet()
Output:
Before function call
Hello!
After function call
In the above, @decorator is shorthand for greet = decorator(greet). The decorator function takes
another function as input and returns a new function that adds behavior before and after the
original function call.
📌 Decorators are powerful tools in Python for adding reusable logic around functions.
| 48 | P a g e
reduce() – Applies a rolling computation to sequential pairs of values (imported from
functools).
Example:
from functools import reduce
nums = [1, 2, 3]
def __new__(cls):
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance
a = Singleton()
b = Singleton()
print(a is b) # Output: True
Here, __new__ is overridden to ensure that only one instance is created.
📌 Singleton patterns are commonly used in systems where one global shared resource is
required.
| 49 | P a g e
94. How do you sort a dictionary by its values in Python?
To sort a dictionary by its values, you can use the sorted() function with a lambda function as the
key. This is useful when prioritizing elements based on associated values.
Example:
my_dict = {'a': 3, 'b': 1, 'c': 2}
sorted_dict = dict(sorted(my_dict.items(), key=lambda item: item[1]))
print(sorted_dict) # {'b': 1, 'c': 2, 'a': 3}
Here, sorted() sorts the items (key-value pairs), and dict() reconstructs them back into a dictionary.
📌 This method is often used in data processing tasks like ranking or summarization.
| 50 | P a g e
content = [Link]()
print(content)
Modes available:
'r' – read
'w' – write (overwrite)
'a' – append
'b' – binary mode
'x' – exclusive creation
📌 Always use the with statement for safer and cleaner file handling.
97. How do you find the most frequent element in a list using [Link]?
Python’s [Link] is an efficient way to count occurrences of items in an iterable and
retrieve the most common ones.
Example:
from collections import Counter
lst = [1, 2, 2, 3, 3, 3]
counter = Counter(lst)
most_common = counter.most_common(1)[0][0]
print(most_common) # Output: 3
Here, most_common(1) returns a list with the most frequent item and its count as a tuple.
📌 This is highly useful in NLP, frequency analysis, and classification problems.
| 51 | P a g e
def __init__(self, title):
[Link] = title
def __str__(self):
return f"Book title: {[Link]}"
| 52 | P a g e
Runtime Modification, Memory Management, And
Code Optimization
# Original sqrt
print([Link](4)) # Output: 2.0
# Monkey patching
[Link] = lambda x: "Patched!"
print([Link](4)) # Output: Patched!
This can be helpful during:
Testing (to mock certain behaviors)
Adding quick fixes to third-party libraries
Dynamic behavior insertion for prototyping
However, monkey patching can make code difficult to debug and maintain. It should be used
cautiously as it can introduce hard-to-track bugs.
📌 Use monkey patching only when absolutely necessary and document it clearly.
100. How do you check if an element exists in a list without using the in keyword?
While in is the most Pythonic way to check for existence, an alternative method is using the any()
function with a generator expression.
Example:
def exists(lst, element):
return any(x == element for x in lst)
| 53 | P a g e
This approach can also be used to apply custom logic during the check (like case-insensitive
comparison, partial match, etc.).
📌 This technique is often tested in interviews to assess familiarity with functional constructs and
logical reasoning.
101. How do you swap values of two variables in Python without using a temporary
variable?
Python allows variable swapping using tuple unpacking, which is concise and efficient.
Example:
a, b = 5, 10
a, b = b, a
print(a, b) # Output: 10 5
This works by packing the values into a temporary tuple and then unpacking them in reversed
order.
📌 Python’s ability to swap values this way is often highlighted for its elegance and is a common
interview question.
if __name__ == "__main__":
main()
Why it matters:
Prevents certain code from being executed during import
Useful in scripts that include tests or demo runs
Encourages modular, reusable code design
| 54 | P a g e
Example:
class Parent:
def greet(self):
print("Hello from Parent")
class Child(Parent):
def greet(self):
super().greet()
print("Hello from Child")
Child().greet()
Output:
Hello from Parent
Hello from Child
Benefits:
Supports cooperative multiple inheritance
Makes code more maintainable and scalable
Ensures that the parent class initialization is executed
a = [Link]([1, 2, 3])
b = [Link](a)
| 55 | P a g e
Use [Link]() when you want to ensure independence from the original object.
Use [Link]() for performance optimization when copying is unnecessary.
| 56 | P a g e
Large Datasets, Aggregation, And Pivot Operations In
Pandas
📌 Always profile your memory usage and tweak these parameters accordingly for optimal
performance.
df = [Link]({
'Region': ['North', 'South', 'North', 'South'],
'Product': ['A', 'A', 'B', 'B'],
| 57 | P a g e
'Sales': [100, 150, 200, 250]
})
📌 Pivot tables are essential for data summarization and exploratory data analysis.
📌 Think of transform() as the go-to function when you need per-group calculations with full
DataFrame shape preservation.
108. What is the difference between merge(), join(), and concat() in pandas?
Method Purpose Key Features
Highly flexible, supports
merge() SQL-style joins using keys
left/right/outer/inner joins
Joins based on the index (or a key
join() Simpler syntax for index-based joins
column)
| 58 | P a g e
Method Purpose Key Features
Concatenates DataFrames along a
concat() Used to stack vertically or horizontally
particular axis
Example of merge (on key column):
[Link](df1, df2, on='id', how='inner')
Example of join (on index):
[Link](df2, how='left')
Example of concat:
[Link]([df1, df2], axis=0) # Vertical stacking
[Link]([df1, df2], axis=1) # Horizontal stacking
📌 Use merge() when dealing with key-based relational data. Use concat() when appending or
stacking similar datasets.
print(p.x) # Output: 10
print(p.y) # Output: 20
Benefits:
Field names make code more readable.
Uses less memory than classes.
Instances are immutable and hashable (can be used as dictionary keys).
| 59 | P a g e
Example:
df_cleaned = (
[Link]()
.assign(sales_tax=lambda x: x['Sales'] * 0.18)
.query('sales_tax > 50')
.sort_values(by='sales_tax', ascending=False)
)
Advantages:
Improves code readability
Encourages functional programming style
Reduces clutter by avoiding temporary variables
📌 It's particularly useful in data cleaning and wrangling pipelines where many transformations
are applied sequentially.
| 60 | P a g e
Built-In Functions And Iteration Utilities In Python
📌 These functions are concise and efficient alternatives to writing loops for condition checks.
📌 enumerate() increases clarity and avoids the need to manage counters manually during
iteration.
| 61 | P a g e
113. How do you create a virtual environment in Python?
A virtual environment is a self-contained directory where you can install project-specific
dependencies without affecting global installations.
Steps to Create:
1. Install virtualenv (if not already):
2. pip install virtualenv
3. Create a virtual environment:
4. virtualenv env_name
5. Activate the environment:
o On Windows:
o env_name\Scripts\activate
o On macOS/Linux:
o source env_name/bin/activate
6. Install packages locally:
7. pip install pandas NumPy
8. Deactivate the environment:
9. deactivate
📌 Virtual environments are essential for avoiding dependency conflicts and ensuring
reproducibility.
| 62 | P a g e
7. Q3 = df['column'].quantile(0.75)
8. IQR = Q3 - Q1
9. df = df[~((df['column'] < (Q1 - 1.5 * IQR)) | (df['column'] > (Q3 + 1.5 * IQR)))]
10. Boxplots and Visual Inspection:
11. import seaborn as sns
12. [Link](data=df, x='column')
Handling Methods:
Remove them
Cap or transform them
Use robust algorithms that are insensitive to outliers (e.g., tree-based models)
📌 Outlier detection should align with the business logic and not be purely statistical.
115. What are weak references in Python and where are they used?
A weak reference allows one object to refer to another object without increasing its reference
count. When the original object is no longer needed elsewhere, it can be garbage collected—even
if a weak reference to it still exists.
Use Case:
Helpful in memory-sensitive applications like caching, where you don’t want your cache to
prevent the original object from being collected.
Example:
import weakref
class MyClass:
pass
obj = MyClass()
weak_obj = [Link](obj)
| 63 | P a g e
Caches in large-scale systems
Managing object lifecycles in GUI or simulation applications
📌 Use weak references when you want to keep a lightweight reference to an object without
preventing it from being freed.
| 64 | P a g e
Machine Learning, Data Sampling, And Performance
Optimization
📌 Always validate with stratified sampling to ensure fair evaluation across classes.
| 65 | P a g e
117. What are hashable objects in Python and why do they matter?
A hashable object has a hash value that remains constant throughout its lifetime and supports
comparison via __eq__.
Why It Matters:
Only hashable objects can be used as dictionary keys or added to sets
Immutability is a key trait of hashable objects
Examples:
hash("hello") # Valid
hash((1, 2, 3)) # Valid
hash([1, 2, 3]) # Error: list is unhashable
Hashable Types:
int, str, float, bool
tuple (if all its elements are hashable)
Unhashable Types:
list, set, dict (mutable types)
📌 Understanding hashability helps avoid common runtime errors and design efficient data
structures.
118. What is the difference between a generator expression and a list comprehension?
Both are used to create sequences from iterables, but differ in how they evaluate and store data.
Feature Generator Expression List Comprehension
Syntax (x for x in iterable) [x for x in iterable]
Evaluation Lazy (on-the-fly) Eager (all at once)
Memory Usage Efficient High for large data
Output Type Generator object List
Example:
gen = (x**2 for x in range(1000000)) # Generator
lst = [x**2 for x in range(1000000)] # List
When to Use:
Use generators when dealing with large data and memory optimization is a priority
Use list comprehensions when you need a materialized sequence
📌 Prefer generators in loops, pipelines, or when the full result is not immediately needed.
| 66 | P a g e
119. How do you evaluate a machine learning model in Python?
Evaluation ensures the model performs well on unseen data and helps fine-tune it for production.
Classification Metrics:
from [Link] import accuracy_score, precision_score, recall_score, f1_score,
roc_auc_score
Accuracy: Overall correctness
Precision: True Positives / (True Positives + False Positives)
Recall: True Positives / (True Positives + False Negatives)
F1 Score: Harmonic mean of precision and recall
ROC AUC: Area under ROC curve (for binary classification)
Regression Metrics:
from [Link] import mean_squared_error, mean_absolute_error, r2_score
MAE: Average magnitude of errors
MSE: Squared error penalty
R² Score: Explained variance
Cross-Validation:
from sklearn.model_selection import cross_val_score
cross_val_score(model, X, y, cv=5)
📌 Always evaluate with multiple metrics, and use cross-validation to reduce overfitting risk.
| 67 | P a g e
8. pip install memory_profiler
9. Visualization Tools:
o SnakeViz: Interactive visualization for profiling results
o Py-Spy: Lightweight sampling profiler
10. Timeit Module – Benchmarking small code snippets:
11. import timeit
12. [Link]('sum(range(100))', number=1000)
📌 Profiling is critical for optimizing algorithms in data pipelines and reducing runtime on large
datasets.
| 68 | P a g e
Object-Oriented Programming And Design Patterns
m = Math()
print([Link](5)) #5+0=5
print([Link](5, 10)) # 5 + 10 = 15
Method Overriding
Occurs when a subclass provides its own version of a method from the parent class.
Allows for custom behavior in derived classes.
Example of Overriding:
class Parent:
def greet(self):
print("Hello from Parent")
class Child(Parent):
def greet(self):
print("Hello from Child")
c = Child()
[Link]() # Output: Hello from Child
📌 Overriding is widely used in polymorphism and abstract class implementations, whereas
overloading is mimicked via flexible function definitions.
| 69 | P a g e
122. What is the use of the super() function in Python?
super() is a built-in function used to call methods from a parent or sibling class in the hierarchy.
Use Cases:
Avoids explicitly referring to the base class
Helps maintain the method resolution order (MRO) in multiple inheritance
Promotes DRY (Don't Repeat Yourself) by reusing parent logic
Example:
class Parent:
def greet(self):
print("Hello from Parent")
class Child(Parent):
def greet(self):
super().greet()
print("Hello from Child")
Child().greet()
Output:
Hello from Parent
Hello from Child
📌 super() is vital for extending or modifying base class methods in a clean and scalable manner.
| 70 | P a g e
[Link] = title
def __str__(self):
return f"Book: {[Link]}"
b = Book("Python Basics")
print(b) # Output: Book: Python Basics
📌 Magic methods enable operator overloading, custom iteration, and richer object behavior.
124. What are Python data classes and when should you use them?
Data classes are introduced in Python 3.7 via the dataclasses module. They reduce boilerplate
when defining classes meant to store data.
Features:
Auto-generates __init__, __repr__, __eq__, etc.
Supports default values and type hints
Example:
from dataclasses import dataclass
@dataclass
class Employee:
name: str
age: int
role: str = "Analyst"
Benefits:
Clean syntax
Easier comparison and representation
Good for simple data containers
📌 Use data classes for modeling data records or DTOs (Data Transfer Objects).
| 71 | P a g e
Enforcing coding standards
Automatically adding methods or attributes
Logging and validation during class creation
Example:
class Meta(type):
def __new__(cls, name, bases, dct):
print(f"Creating class {name}")
return super().__new__(cls, name, bases, dct)
class MyClass(metaclass=Meta):
pass
Output:
Creating class MyClass
📌 Metaclasses are an advanced feature best used when building frameworks or highly dynamic
systems.
📌 Use monkey patching sparingly and only in controlled scenarios like testing.
| 72 | P a g e
I hope you found this
helpful!
APOORVA IYER
| 73 | P a g e
NumPy is foundational to scientific computing in Python because it provides efficient and reliable multi-dimensional array structures and a suite of mathematical operations optimized for performance . Its capabilities in vectorized operations and memory efficiency surpass typical Python lists and loops, enhancing computational speed and efficiency. Moreover, NumPy underpins many other libraries in the ecosystem, such as Pandas, scikit-learn, and TensorFlow, making it indispensable for any data pipeline. Its interoperability with other scientific libraries bolsters its role, ensuring seamless integration for complex data analysis tasks and mathematical computations across vast datasets .
The zip() function in Python combines multiple iterables, such as lists or tuples, into a single iterator of tuples . It's particularly useful in parallel iteration over several sequences, creating tuples from elements at the same position in each input iterable. Practical applications include creating dictionaries by pairing keys with values using dict(zip(keys, values)) and simplifying loops where simultaneous iteration over multiple lists is required. Unzipping data can be accomplished using zip(*zipped_data), which is useful for separating components of combined data sets. Its memory-efficient nature, generating results as needed, makes it valuable for large-scale data transformation tasks .
Broadcasting in NumPy refers to the implicit technique of extending smaller arrays so that arithmetic operations can be performed across arrays of different shapes . This happens without physically copying the data to the shape of the larger array, thereby optimizing memory usage and computation speed. Broadcasting follows specific rules to match dimensions, such as adding dummy dimensions to smaller arrays. For example, when adding a 1D array to each row of a 2D array, the 1D array is virtually 'stretched' to match each row, facilitating efficient element-wise operations without explicit looping or reshaping. This feature enhances the simplicity and speed of implementing mathematical operations in NumPy.
Pandas offers several methods for handling missing values, including isnull() for detecting missing values, dropna() to remove rows or columns that contain missing values, fillna() to replace missing values with a specified value, and interpolate() to fill missing values using interpolation . Choosing the right method depends on the dataset and analysis goals. Dropping data (dropna()) might lead to a loss of potentially valuable information, especially if missing values are not random. Filling or interpolating missing data (fillna() and interpolate()) can introduce biases or incorrect data trends if not executed based on domain-specific knowledge. Thus, careful consideration and exploratory analysis are vital to ensure that the imputation or omission does not distort analytical insights.
np.array() and np.asarray() both convert data into NumPy arrays, but they differ in their handling of existing arrays . np.array() creates a new array, duplicating the data regardless of the input. In contrast, np.asarray() returns the input itself if it is already a NumPy array, avoiding unnecessary data duplication, making it more efficient when dealing with potentially large datasets. np.asarray() should be preferred in scenarios where data copy should be minimized for performance reasons, such as when integrating closely with other functions that may already return NumPy arrays, while np.array() should be used when an explicit copy is required or when modifying the original data might introduce side effects .
Pandas simplifies data wrangling tasks by providing robust methods for cleaning, merging, filtering, and transforming tabular data . It has two core data structures that facilitate these operations: DataFrames and Series. DataFrames are 2D labeled data structures that support heterogeneous data types, resembling a SQL table or Excel spreadsheet, whereas Series are 1D labeled arrays allowing computations on indexed data . These structures underpin Pandas' ability to process large datasets efficiently and flexibly.
The function str() in Python is used to return a readable and user-friendly string representation of an object. It aims at being easily readable and is intended for displaying to end-users . On the other hand, repr() provides an unambiguous string representation of an object that can be used for debugging. It often includes more technical details about the object, to help developers during debugging . The usage of str() is more suited for user interface design where readability is key, while repr() is more appropriate for logging and debugging where precision and the ability to reproduce the object are important.
Method chaining in Pandas refers to the practice of calling multiple methods in a single, cohesive expression . This technique contributes to cleaner code by reducing clutter and improving readability, allowing for a more streamlined data handling process. By chaining operations such as filtering, assigning new columns, and querying data, it enables a workflow that is easier to follow and maintain. The ability to compose operations succinctly also minimizes intermediate variables, enhancing both code performance and logical clarity in complex data transformations.
In Pandas, merge() performs SQL-style joins on DataFrames, supporting inner, left, right, and outer joins defined by common keys . Join() is used primarily to combine DataFrames on their index, while concat() can stack DataFrames along an axis, either vertically or horizontally without the need of specifying a key. Merge() is ideal for relational datasets where records are linked via shared attributes. Join() is convenient when you want to merge DataFrames with aligned indices. Concat() is used for appending datasets like logs or when a simple union is needed without specific relational logic. Each function corresponds to different integration scenarios, focusing on key alignment, structural orientation, or simple append operations.
Python's data classes, introduced in Python 3.7, enhance class-based programming by reducing boilerplate code typically needed to define classes for storing structured data . They automatically generate special methods such as __init__(), __repr__(), and __eq__() based on class attributes, providing convenient default behavior, including type hints and immutability features . Unlike traditional classes where these methods need to be explicitly defined, data classes offer out-of-the-box functionality that simplifies code writing and maintenance. This is particularly beneficial in data-heavy applications where the setup times for classes can be a bottleneck, allowing developers to focus on data manipulation rather than class management. The decorator auto-handling of comparison and state representation problems makes data classes suitable for record-like data manipulation, configuration objects, and even prototyping complex data models .