Python Backend
Complete Interview Preparation Guide
Core Python · Django · DRF · FastAPI · MySQL · System Design · DevOps · AI/ML Basics
1. Core Python
2. Django
3. Django REST Framework
4. FastAPI
5. MySQL & SQL
6. System Design
7. DevOps & Docker
8. Git
9. AI/ML Basics
10. Coding Practice
11. 60+ Interview Q&A
Chapter 1 · Core Python
1.1 Python Data Types
Python has several built-in data types. Understanding them deeply is crucial for interviews because questions
about mutability, performance, and internals are very common.
Type Characteristics & Key Points
list Mutable, ordered, allows duplicates. Dynamic array. O(1) append, O(n)
insert/search.
tuple Immutable, ordered, allows duplicates. Faster & smaller than list. Hashable (can
be dict key).
set Mutable, unordered, NO duplicates. Hash table. O(1) add/remove/lookup.
dict Mutable, ordered (Python 3.7+), key-value. Hash table. O(1) average
get/set/delete.
str Immutable sequence of Unicode chars. Interned at compile time for small strings.
frozenset Immutable set. Hashable. Can be used as dict key.
bytes/bytearray Immutable/mutable raw binary sequences.
Interview Tip list vs tuple: use tuple for fixed data (coordinates, DB records), list for dynamic collections. tuple
is ~20% faster to create.
Mutable vs Immutable
Immutable objects: int, float, bool, str, tuple, frozenset, bytes — cannot be changed after creation; any
'modification' creates a new object.
Mutable objects: list, dict, set, bytearray — can be changed in place; multiple variables can alias the same object
(aliasing bug risk).
# Immutable — modifying creates new object
a = 'hello'
b = a
a += ' world'
print(b) # 'hello' — b is unchanged (different object)
# Mutable — aliasing danger!
x = [1, 2, 3]
y = x # y points to SAME list
[Link](4)
print(x) # [1, 2, 3, 4] — x also changed!
# Fix: use copy
y = [Link]() # shallow copy
import copy
y = [Link](x) # deep copy for nested structures
Common Bug Default mutable arguments in functions are shared across all calls. Always use None as default and
create inside function.
Pass by Reference vs Value
Python uses 'pass by object reference'. Immutable objects behave like pass-by-value (rebinding doesn't affect
caller). Mutable objects behave like pass-by-reference (mutation affects caller).
def add_item(lst, item):
[Link](item) # mutates caller's list
def reset(x):
x = [] # rebinding — caller's variable unchanged
my_list = [1, 2]
add_item(my_list, 3) # my_list is now [1, 2, 3]
reset(my_list) # my_list is still [1, 2, 3]
1.2 Python Internals & Memory
Memory Management & Garbage Collection
CPython uses reference counting as the primary memory management mechanism. Every object has a reference
count. When it drops to 0, memory is freed immediately.
The gc module adds a cycle detector on top to handle circular references (A→B→A). It uses three generations —
short-lived objects are collected most frequently.
import sys, gc
x = [1, 2, 3]
print([Link](x)) # 2 (x + argument to getrefcount)
y = x # refcount becomes 3
del y # refcount back to 2
# Force garbage collection
[Link]()
print(gc.get_count()) # (gen0, gen1, gen2) counts
Memory Tip Python interns small integers (-5 to 256) and compile-time strings. So 'a is b' for these may be True
— never use 'is' for value comparison, always use '=='.
How Dictionary Works Internally
Python dict is implemented as a hash table. Keys are hashed using hash(), the hash is used to find a bucket.
Collision resolution uses open addressing (probe sequences).
Operation Average / Worst Case Time Complexity
dict[key] (get) O(1) average / O(n) worst (many hash collisions)
dict[key] = val (set) O(1) average / O(n) worst
key in dict O(1) average
del dict[key] O(1) average
Iteration O(n)
GIL — Global Interpreter Lock
The GIL is a mutex in CPython that ensures only one thread executes Python bytecode at a time. It simplifies
memory management but limits CPU-bound parallelism.
Scenario Behavior with GIL
I/O-bound threads (network, BENEFIT — GIL is released during blocking I/O
disk)
CPU-bound threads NO BENEFIT — use multiprocessing instead
asyncio NOT AFFECTED — single-threaded cooperative concurrency
Python 3.13+ Experimental no-GIL build (PEP 703)
Interview Tip If asked 'how to use all CPU cores in Python' — answer: use multiprocessing (not threading)
because multiprocessing bypasses the GIL by creating separate processes.
__slots__ — Memory Optimization
By default every instance stores attributes in a per-object __dict__ (hash map overhead). Declaring __slots__
replaces it with fixed C-level descriptors, saving ~40-50% memory.
class WithDict:
def __init__(self, x, y):
self.x = x
self.y = y
class WithSlots:
__slots__ = ('x', 'y') # no __dict__ created
def __init__(self, x, y):
self.x = x
self.y = y
# WithSlots uses ~40% less memory per instance
# Downside: cannot add new attributes dynamically
1.3 Functions: *args, **kwargs, Lambda, HOF
*args and **kwargs
*args collects extra positional arguments into a tuple. **kwargs collects extra keyword arguments into a dict.
def greet(*args, **kwargs):
for name in args:
print(f'Hello, {name}!')
for key, val in [Link]():
print(f'{key}: {val}')
greet('Alice', 'Bob', city='NYC', lang='Python')
# Hello, Alice!
# Hello, Bob!
# city: NYC
# lang: Python
# Unpacking when calling
def add(a, b, c): return a + b + c
nums = [1, 2, 3]
print(add(*nums)) # 6
opts = {'a': 1, 'b': 2, 'c': 3}
print(add(**opts)) # 6
Lambda, Higher-Order Functions
Lambda is an anonymous one-liner function. Higher-order functions take functions as arguments or return them.
# Lambda
square = lambda x: x ** 2
print(square(5)) # 25
# map, filter, reduce — higher-order functions
nums = [1, 2, 3, 4, 5]
squares = list(map(lambda x: x**2, nums)) # [1,4,9,16,25]
evens = list(filter(lambda x: x%2==0, nums)) # [2,4]
from functools import reduce
total = reduce(lambda acc, x: acc + x, nums) # 15
# sorted with key function
words = ['banana', 'apple', 'cherry']
sorted_words = sorted(words, key=lambda w: len(w))
print(sorted_words) # ['apple', 'banana', 'cherry']
1.4 Advanced: Closures, Decorators, Generators, Context Managers
Closures
A closure is an inner function that remembers variables from its enclosing scope, even after the outer function has
finished executing.
def make_counter(start=0):
count = start
def increment(by=1):
nonlocal count # must declare to modify enclosing variable
count += by
return count
return increment
counter = make_counter(10)
print(counter()) # 11
print(counter(5)) # 16
# Common gotcha: loop variable capture
fns = [lambda: i for i in range(3)]
print([f() for f in fns]) # [2, 2, 2] ← all see last i
# Fix: bind by value using default arg
fns = [lambda i=i: i for i in range(3)]
print([f() for f in fns]) # [0, 1, 2] ← correct
Decorators
A decorator wraps a function to add behavior without modifying the original code. Always use
@[Link] to preserve original metadata.
import functools, time
def timer(func):
@[Link](func) # preserves __name__, __doc__
def wrapper(*args, **kwargs):
start = time.perf_counter()
result = func(*args, **kwargs)
print(f'{func.__name__} took {time.perf_counter()-start:.4f}s')
return result
return wrapper
# Decorator with arguments
def retry(times=3):
def decorator(func):
@[Link](func)
def wrapper(*args, **kwargs):
for attempt in range(times):
try:
return func(*args, **kwargs)
except Exception as e:
if attempt == times - 1: raise
return wrapper
return decorator
@timer
@retry(times=3)
def fetch_data(url):
pass # will retry 3 times on exception, then report time
Interview Tip Decorators are applied bottom-up. @timer @retry means: first retry wraps fetch_data, then timer
wraps the result.
Generators & yield
Generators produce values lazily — they don't create all values in memory at once. Perfect for large datasets or
infinite sequences.
def fibonacci():
a, b = 0, 1
while True:
yield a # suspends here, returns value
a, b = b, a + b
gen = fibonacci()
print([next(gen) for _ in range(8)]) # [0,1,1,2,3,5,8,13]
# Generator expression — like list comp but lazy
total = sum(x*x for x in range(1_000_000)) # O(1) memory!
# yield from — delegate to sub-generator
def chain(*iterables):
for it in iterables:
yield from it
print(list(chain([1,2],[3,4],[5]))) # [1,2,3,4,5]
# send() — two-way communication
def accumulator():
total = 0
while True:
value = yield total
if value is None: break
total += value
Context Managers (with statement)
Context managers handle setup/teardown automatically (files, DB connections, locks). Implement __enter__ and
__exit__, or use @contextmanager.
# Class-based context manager
class DatabaseConnection:
def __enter__(self):
[Link] = connect_to_db()
return [Link]
def __exit__(self, exc_type, exc_val, exc_tb):
[Link]() # always runs, even on exception
return False # False = don't suppress exceptions
with DatabaseConnection() as conn:
[Link]('SELECT 1')
# Generator-based using contextlib
from contextlib import contextmanager
@contextmanager
def timer_context(label):
import time
start = time.perf_counter()
try:
yield # body of 'with' runs here
finally:
print(f'{label}: {time.perf_counter()-start:.4f}s')
with timer_context('DB query'):
result = [Link]('SELECT * FROM orders')
1.5 OOP: Classes, Inheritance, MRO, Magic Methods
Class vs Object, __init__ vs __new__
__new__ creates and returns the new instance. __init__ initializes it. __new__ runs first; the object it returns is
passed as self to __init__. Override __new__ only for immutable types or Singleton pattern.
class Singleton:
_instance = None
def __new__(cls, *args, **kwargs):
if not cls._instance:
cls._instance = super().__new__(cls)
return cls._instance
def __init__(self, value):
[Link] = value
s1 = Singleton(10)
s2 = Singleton(20)
print(s1 is s2) # True — same object
print([Link]) # 20 — re-initialized each time
Multiple Inheritance & MRO
Python uses the C3 Linearization algorithm to determine Method Resolution Order in multiple inheritance. Use
ClassName.__mro__ to inspect it. Always use super() — it follows MRO dynamically.
class A:
def greet(self): return 'A'
class B(A):
def greet(self): return 'B -> ' + super().greet()
class C(A):
def greet(self): return 'C -> ' + super().greet()
class D(B, C): # MRO: D -> B -> C -> A -> object
pass
print(D().greet()) # B -> C -> A
print([c.__name__ for c in D.__mro__]) # ['D','B','C','A','object']
Important Magic / Dunder Methods
Method Purpose & When Used
__str__ Human-readable string — used by str() and print()
__repr__ Developer-friendly repr — used by repr() and REPL. Should be
unambiguous.
__len__ Called by len(obj) — must return non-negative int
__eq__ / __hash__ __eq__ for ==. If you define __eq__, also define __hash__ (or set it None
for unhashable)
__lt__, __gt__ Comparison operators. Use @functools.total_ordering to auto-fill the rest.
__add__ / __mul__ Operator overloading: obj + other, obj * other
__getitem__ obj[key] — enables indexing and slicing
__contains__ item in obj — membership test
__call__ obj() — makes instances callable
__enter__ / __exit__ Context manager protocol — used with 'with' statement
__iter__ / __next__ Iterator protocol — enables use in for loops
1.6 Concurrency: Threading, Multiprocessing, Async
Python offers three concurrency models. Choosing the right one based on the task type is a very common
interview question.
Model Best For GIL Impact
threading I/O-bound (network, file, DB GIL released during I/O — threads benefit
calls)
multiprocessing CPU-bound (math, image Each process has own GIL — true
processing, ML) parallelism
asyncio High-concurrency I/O (thousands Single-threaded — GIL irrelevant
of connections)
# asyncio example — handle many connections without threads
import asyncio
async def fetch(session, url):
async with [Link](url) as response:
return await [Link]()
async def main():
urls = ['[Link] '[Link]
# gather runs all coroutines concurrently
results = await [Link](*[fetch(session, u) for u in urls])
return results
[Link](main())
Interview Tip Django is synchronous (WSGI) by default. FastAPI is ASGI and natively async. For Django +
async, use Django Channels or migrate to ASGI mode.
1.7 Collections Module & Data Structures
from collections import defaultdict, Counter, OrderedDict, deque, namedtuple
# defaultdict — auto-creates missing keys
freq = defaultdict(int)
for ch in 'banana':
freq[ch] += 1
print(dict(freq)) # {'b':1, 'a':3, 'n':2}
# Counter — frequency counting + arithmetic
c = Counter('banana')
print(c.most_common(2)) # [('a', 3), ('n', 2)]
# deque — O(1) append/pop at BOTH ends
dq = deque([1, 2, 3], maxlen=5)
[Link](0) # O(1) — list would be O(n)
[Link](1) # rotate right by 1
# namedtuple — lightweight immutable struct
Point = namedtuple('Point', ['x', 'y'])
p = Point(3, 4)
print(p.x, p._asdict()) # 3 {'x':3, 'y':4}
heapq — Priority Queue
import heapq
# Min-heap
nums = [5, 1, 8, 3, 2]
[Link](nums) # O(n) in-place
print([Link](nums)) # 1 — O(log n)
[Link](nums, 0) # O(log n)
# k smallest / largest
print([Link](2, nums)) # O(n log k)
print([Link](2, nums))
# Max-heap: negate values
max_heap = [-x for x in [5, 1, 8, 3]]
[Link](max_heap)
print(-[Link](max_heap)) # 8
# Priority queue with (priority, item) tuples
tasks = []
[Link](tasks, (2, 'low priority'))
[Link](tasks, (0, 'urgent'))
print([Link](tasks)) # (0, 'urgent')
1.8 Caching with functools.lru_cache
LRU (Least Recently Used) cache memoizes function results. Repeated calls with the same arguments return the
cached result instantly.
from functools import lru_cache
@lru_cache(maxsize=128) # cache last 128 unique arg combos
def fibonacci(n):
if n < 2: return n
return fibonacci(n-1) + fibonacci(n-2)
print(fibonacci(50)) # instant — memoized
print(fibonacci.cache_info()) # CacheInfo(hits=48, misses=51, ...)
fibonacci.cache_clear() # clear the cache
# Python 3.9+ — unbounded cache, lighter than lru_cache
from functools import cache
@cache
def expensive(n):
return n ** n
Chapter 2 · Django
2.1 MVT Architecture & Request Lifecycle
Django follows the Model-View-Template (MVT) pattern. Unlike MVC, the framework itself acts as the
Controller.
Component Responsibility
Model Defines data structure + all database interaction via ORM
View Contains business logic — receives request, queries models, returns
response
Template Presentation layer — HTML + Django template language (or DRF JSON)
URL Conf Routes URL patterns to the correct view function/class
Request → Response Lifecycle (Very Common Interview Question)
1. WSGI/ASGI server (Gunicorn/Uvicorn) receives TCP data, wraps in HttpRequest
2. Request Middleware stack processes top-to-bottom (Security, Session, Auth, CSRF, ...)
3. URLResolver iterates urlpatterns, finds match, extracts path parameters
4. View executes — queries DB via ORM, calls services, processes business logic
5. View returns HttpResponse (render template / DRF serializer / JsonResponse)
6. Response Middleware processes bottom-to-top
7. WSGI/ASGI server sends response bytes to client
2.2 Models & ORM
Field Types & Relationships
from [Link] import models
class Author([Link]):
name = [Link](max_length=200)
email = [Link](unique=True)
bio = [Link](blank=True)
class Book([Link]):
# OneToOne — each book has ONE publisher contract
contract = [Link]('Contract', on_delete=[Link])
# ForeignKey (Many-to-One) — many books, one author
author = [Link](Author, on_delete=[Link],
related_name='books')
# ManyToMany — book can be in many stores, store has many books
stores = [Link]('Store', blank=True)
title = [Link](max_length=300)
price = [Link](max_digits=8, decimal_places=2)
pub_date= [Link](auto_now_add=True)
class Meta:
ordering = ['-pub_date']
indexes = [[Link](fields=['author', 'pub_date'])]
constraints= [[Link](
fields=['author','title'], name='unique_book_per_author')]
select_related vs prefetch_related (N+1 Problem)
N+1 Problem Fetching 100 orders then accessing [Link] for each = 1 + 100 = 101 queries. This is the #1
Django performance bug and a very common interview topic.
Method How It Works & When to Use
select_related SQL JOIN — fetches related objects in a single query. Use for ForeignKey
and OneToOneField.
prefetch_related 2 separate queries + Python joining. Use for ManyToManyField and
reverse ForeignKey.
Prefetch() object Custom queryset for prefetch_related — lets you filter/sort the prefetched
data.
# N+1 Problem — BAD
orders = [Link]()
for order in orders:
print([Link]) # 1 extra query per order!
# Fix with select_related (FK → SQL JOIN)
orders = [Link].select_related('customer', 'customer__address').all()
# Now: 1 total query
# Fix with prefetch_related (M2M → 2 queries)
products = [Link].prefetch_related('tags', 'reviews').all()
# Custom prefetch with filtering
from [Link] import Prefetch
recent = [Link](rating__gte=4).order_by('-created_at')
products = [Link].prefetch_related(
Prefetch('reviews', queryset=recent, to_attr='top_reviews')
)
annotate, aggregate, F(), Q()
from [Link] import Count, Avg, Sum, Max, F, Q
# aggregate — returns a single dict
stats = [Link](
total=Sum('amount'), avg=Avg('amount'), count=Count('id')
)
# annotate — adds computed column to each row
categories = [Link](
product_count=Count('products'),
avg_price=Avg('products__price'),
)
# F() — reference DB column value (atomic, no Python race)
[Link](id=42).update(stock=F('stock') - 1)
# Q() — complex OR/NOT queries
active_staff = [Link](
Q(is_staff=True) | Q(is_superuser=True)
)
non_admin = [Link](~Q(groups__name='admin'))
Transactions & Pessimistic Locking
from [Link] import transaction
@[Link]
def transfer_funds(from_id, to_id, amount):
# select_for_update locks rows until transaction ends
from_acc = [Link].select_for_update().get(id=from_id)
to_acc = [Link].select_for_update().get(id=to_id)
if from_acc.balance < amount:
raise ValueError('Insufficient funds')
from_acc.balance -= amount
to_acc.balance += amount
from_acc.save()
to_acc.save() # if this fails, BOTH saves are rolled back
# Nested savepoints
with [Link]():
create_order()
with [Link](): # creates a savepoint
try:
charge_payment()
except PaymentError:
pass # only rolls back to savepoint, not outer block
2.3 Custom Managers & Signals
Custom Managers
class ActiveManager([Link]):
def get_queryset(self):
return super().get_queryset().filter(is_active=True)
def recently_joined(self, days=30):
from [Link] import timezone
cutoff = [Link]() - [Link](days=days)
return self.get_queryset().filter(date_joined__gte=cutoff)
class User([Link]):
objects = [Link]() # default manager
active = ActiveManager() # custom manager
# Usage
[Link]() # only active users
[Link].recently_joined(7) # active + joined in last 7 days
Signals
Signals allow decoupled components to get notified when certain actions occur. They're commonly used for
automatic profile creation, cache invalidation, and audit logging.
from [Link] import post_save, pre_delete
from [Link] import receiver
@receiver(post_save, sender=User)
def on_user_created(sender, instance, created, **kwargs):
if created:
[Link](user=instance) # auto-create profile
send_welcome_email.delay([Link]) # async email (Celery)
@receiver(pre_delete, sender=Order)
def on_order_deleted(sender, instance, **kwargs):
for item in [Link]():
[Link] += [Link] # restore inventory
[Link]()
Signal Warning Signals create hidden coupling. For cross-app communication they're fine. For same-app logic,
prefer explicit service-layer calls to keep code readable and testable.
2.4 Middleware
import time
class RequestTimingMiddleware:
def __init__(self, get_response):
self.get_response = get_response # called once at startup
def __call__(self, request):
# ── Before view ──
start = time.perf_counter()
request_id = uuid.uuid4().hex[:8]
[Link]['REQUEST_ID'] = request_id
response = self.get_response(request) # calls next layer/view
# ── After view ──
duration = time.perf_counter() - start
response['X-Response-Time'] = f'{duration * 1000:.1f}ms'
response['X-Request-ID'] = request_id
return response
def process_exception(self, request, exception):
# Optional: handle exceptions from views
import logging
[Link](f'Unhandled: {exception}', exc_info=True)
2.5 Performance Optimization
Caching with Redis
# [Link]
CACHES = {
'default': {
'BACKEND': 'django_redis.[Link]',
'LOCATION': 'redis://[Link]:6379/1',
'KEY_PREFIX': 'myapp',
}
}
# [Link] — cache-aside pattern
from [Link] import cache
def product_list(request):
cache_key = f'products:page:{[Link]("page", 1)}'
data = [Link](cache_key)
if data is None:
qs = [Link].select_related('category').filter(active=True)
data = ProductSerializer(qs, many=True).data
[Link](cache_key, data, timeout=300) # 5 minutes
return Response(data)
Bulk Operations
# bulk_create — single INSERT, ~100x faster than looping .save()
products = [Product(name=f'Item {i}', price=i*9.99) for i in range(1000)]
[Link].bulk_create(products, batch_size=500)
# bulk_update — single UPDATE statement
products = list([Link](category='old'))
for p in products:
[Link] = 'new'
[Link].bulk_update(products, ['category'], batch_size=500)
# Mass update / delete without loading objects
[Link](stock=0).update(status='out_of_stock')
[Link](created_at__lt=cutoff).delete()
Chapter 3 · Django REST Framework (DRF)
3.1 Serializers
from rest_framework import serializers
class UserSerializer([Link]):
full_name = [Link]() # computed field
password = [Link](write_only=True, min_length=8)
confirm = [Link](write_only=True)
class Meta:
model = User
fields = ['id', 'username', 'email', 'full_name', 'password', 'confirm']
read_only_fields = ['id']
def get_full_name(self, obj):
return f'{obj.first_name} {obj.last_name}'.strip()
def validate_email(self, value): # field-level validation
if [Link](email=value).exists():
raise [Link]('Email already in use.')
return [Link]()
def validate(self, data): # cross-field validation
if data['password'] != [Link]('confirm'):
raise [Link]('Passwords do not match.')
return data
def create(self, validated_data):
return [Link].create_user(**validated_data)
3.2 ViewSets & Routers
from rest_framework import viewsets, permissions, status
from rest_framework.decorators import action
from rest_framework.response import Response
class ProductViewSet([Link]):
serializer_class = ProductSerializer
permission_classes = [[Link]]
filter_backends = [DjangoFilterBackend, SearchFilter, OrderingFilter]
filterset_fields = ['category', 'in_stock']
search_fields = ['name', 'description']
ordering_fields = ['price', 'name', 'created_at']
def get_queryset(self):
return ([Link]
.select_related('category')
.prefetch_related('tags')
.filter(active=True))
@action(detail=True, methods=['post'], url_path='restock',
permission_classes=[[Link]])
def restock(self, request, pk=None):
product = self.get_object()
qty = int([Link]('quantity', 0))
if qty <= 0:
return Response({'error': 'Quantity must be positive'},
status=status.HTTP_400_BAD_REQUEST)
[Link] += qty
[Link](update_fields=['stock'])
return Response({'stock': [Link]})
3.3 Authentication — JWT
# [Link]
from datetime import timedelta
REST_FRAMEWORK = {
'DEFAULT_AUTHENTICATION_CLASSES': [
'rest_framework_simplejwt.[Link]',
],
'DEFAULT_PERMISSION_CLASSES': ['rest_framework.[Link]'],
'DEFAULT_THROTTLE_CLASSES': [
'rest_framework.[Link]',
'rest_framework.[Link]',
],
'DEFAULT_THROTTLE_RATES': {'anon': '100/day', 'user': '1000/day'},
}
SIMPLE_JWT = {
'ACCESS_TOKEN_LIFETIME' : timedelta(minutes=15),
'REFRESH_TOKEN_LIFETIME': timedelta(days=7),
'ROTATE_REFRESH_TOKENS' : True,
'ALGORITHM' : 'HS256',
'AUTH_HEADER_TYPES' : ('Bearer',),
}
3.4 Custom Permissions
from rest_framework.permissions import BasePermission, SAFE_METHODS
class IsOwnerOrReadOnly(BasePermission):
def has_object_permission(self, request, view, obj):
if [Link] in SAFE_METHODS: # GET, HEAD, OPTIONS
return True
return [Link] == [Link]
class HasRole(BasePermission):
required_role = None
def has_permission(self, request, view):
if not [Link] or not [Link].is_authenticated:
return False
return [Link](name=self.required_role).exists()
Chapter 4 · FastAPI
4.1 ASGI vs WSGI
Protocol Key Characteristics
WSGI (Flask, Django sync) Synchronous. One thread per request. Blocks on I/O. Cannot handle
WebSockets natively.
ASGI (FastAPI, Django 3+) Async. Single event loop handles thousands of concurrent connections.
Supports WebSockets.
4.2 Pydantic Models
from pydantic import BaseModel, EmailStr, Field, validator, model_validator
class UserCreate(BaseModel):
username: str = Field(..., min_length=3, max_length=50,
pattern=r'^[a-zA-Z0-9_]+$')
email : EmailStr
age : int = Field(..., ge=18, le=120)
password: str = Field(..., min_length=8)
confirm : str
@validator('username')
def username_not_reserved(cls, v):
if [Link]() in ('admin', 'root', 'system'):
raise ValueError('Username is reserved')
return v
@model_validator(mode='after')
def passwords_match(self):
if [Link] != [Link]:
raise ValueError('Passwords do not match')
return self
4.3 Dependency Injection
from fastapi import Depends, HTTPException
from [Link] import OAuth2PasswordBearer
oauth2_scheme = OAuth2PasswordBearer(tokenUrl='/auth/token')
# DB session — yields resource, cleans up after response
def get_db():
db = SessionLocal()
try:
yield db
finally:
[Link]()
# Auth dependency — chains other dependencies
def get_current_user(
token: str = Depends(oauth2_scheme),
db = Depends(get_db),
) -> User:
try:
payload = [Link](token, SECRET_KEY, algorithms=[ALGORITHM])
user_id = [Link]('sub')
except JWTError:
raise HTTPException(status_code=401, detail='Invalid token')
user = [Link](User).filter([Link] == user_id).first()
if not user:
raise HTTPException(status_code=401, detail='User not found')
return user
@[Link]('/me')
async def read_me(current_user: User = Depends(get_current_user)):
return current_user
4.4 Async Patterns & Background Tasks
import asyncio
from fastapi import BackgroundTasks
# Concurrent I/O with [Link]
@[Link]('/dashboard/{user_id}')
async def dashboard(user_id: int, db=Depends(get_db)):
user, orders, notifs = await [Link](
get_user_async(user_id, db),
get_orders_async(user_id, db),
get_notifications_async(user_id, db),
)
return {'user': user, 'orders': orders, 'notifications': notifs}
# BackgroundTasks — fire and forget after response is sent
def send_email(to: str, subject: str):
email_client.send(to=to, subject=subject) # runs in thread pool
@[Link]('/register')
async def register(data: UserCreate, bg: BackgroundTasks, db=Depends(get_db)):
user = create_user(db, data)
bg.add_task(send_email, [Link], 'Welcome!')
return {'id': [Link], 'message': 'Registered'}
Interview Tip FastAPI sync vs async: use 'async def' for I/O-bound endpoints. Use 'def' for CPU-bound or when
using sync libraries — FastAPI runs sync endpoints in a thread pool automatically.
Chapter 5 · MySQL & SQL
5.1 SQL Fundamentals
-- Basic SELECT with filtering
SELECT [Link], [Link], COUNT([Link]) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = [Link]
WHERE u.is_active = 1
GROUP BY [Link], [Link]
HAVING COUNT([Link]) > 5
ORDER BY order_count DESC
LIMIT 10;
-- Subquery
SELECT * FROM products
WHERE price > (SELECT AVG(price) FROM products);
-- Window function (advanced)
SELECT name, salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;
5.2 Types of JOINs
JOIN Type Returns
INNER JOIN Only rows with matching keys in BOTH tables
LEFT JOIN All rows from left table + matching from right (NULL if no match)
RIGHT JOIN All rows from right table + matching from left (NULL if no match)
FULL OUTER JOIN All rows from both tables (NULL where no match)
SELF JOIN Join a table with itself — useful for hierarchical data
CROSS JOIN Cartesian product — every row from left × every row from right
5.3 Indexes
Indexes allow MySQL to find rows without scanning the entire table. They work like a book index — O(log n)
lookup instead of O(n) full scan.
Index Type Use Case
PRIMARY KEY Unique row identifier. Automatically indexed. InnoDB clusters data around
it.
UNIQUE INDEX Prevents duplicate values. Also used for lookups.
Regular INDEX Speed up queries on columns used in WHERE, JOIN ON, ORDER BY,
GROUP BY.
COMPOSITE INDEX Multi-column index. Column ORDER matters — leftmost-prefix rule
applies.
FULL-TEXT INDEX For text search (MATCH...AGAINST). Better than LIKE '%word%'.
PARTIAL INDEX (MySQL 8) Index only a subset of rows — smaller and faster.
-- Check if query uses index
EXPLAIN SELECT * FROM orders WHERE user_id = 42;
-- Composite index — useful for (user_id, status) queries
CREATE INDEX idx_order_user_status ON orders(user_id, status);
-- Covering index — query fetches all columns from index (no table access)
CREATE INDEX idx_covering ON orders(user_id, status, created_at);
Interview Tip Leftmost prefix rule: index on (a, b, c) helps queries on (a), (a, b), or (a, b, c) — but NOT on (b) or
(c) alone.
5.4 ACID Transactions & Isolation Levels
ACID Property Meaning
Atomicity All operations succeed or all are rolled back — no partial commits
Consistency Database moves from one valid state to another — constraints always hold
Isolation Concurrent transactions don't interfere with each other
Durability Committed data survives crashes (written to disk/WAL)
Isolation Level Dirty Read | Non-Repeatable Read | Phantom Read
READ UNCOMMITTED Possible | Possible | Possible (fastest, least safe)
READ COMMITTED Protected | Possible | Possible (PostgreSQL default)
REPEATABLE READ Protected | Protected | Possible (MySQL InnoDB default)
SERIALIZABLE Protected | Protected | Protected (slowest, safest)
5.5 Query Optimization
8. Use EXPLAIN ANALYZE — look for Seq Scan vs Index Scan
9. Add indexes on columns in WHERE, JOIN ON, ORDER BY, GROUP BY
10. Fetch only needed columns — avoid SELECT *
11. Avoid functions on indexed columns in WHERE (prevents index use)
12. Use LIMIT for pagination rather than fetching all rows
13. Avoid OR on different columns — may prevent index usage; consider UNION ALL
-- BAD — function prevents index use
WHERE YEAR(created_at) = 2024
-- GOOD — range allows index
WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31'
-- BAD — leading wildcard prevents index
WHERE name LIKE '%john%'
-- GOOD — only trailing wildcard can use index
WHERE name LIKE 'john%'
Chapter 6 · API Performance & System Design
6.1 Handling High Traffic (2000+ Requests/sec)
Strategy Tool / Approach
Caching Redis or Memcached — cache DB query results, session data, computed
values
Async workers Gunicorn + Uvicorn workers for concurrent request handling
Load balancer Nginx upstream — distribute requests across multiple app instances
Rate limiting DRF throttling, Nginx limit_req, or Redis-based rate limiter
DB optimization Indexes, select_related, bulk operations, read replicas
CDN Cloudflare / CloudFront for static files and cacheable API responses
Connection pooling PgBouncer for PostgreSQL — reduces DB connection overhead
6.2 Booking System — Double Booking Prevention
# Pessimistic locking — lock the row at DB level
from [Link] import transaction
@[Link]
def book_appointment(slot_id, user_id, idempotency_key):
# Idempotency — safe to retry with same key
existing = [Link](
idempotency_key=idempotency_key).first()
if existing:
return existing # return same result
# Lock the slot — other transactions WAIT until this commits
slot = [Link].select_for_update(nowait=True).get(id=slot_id)
if [Link] != 'available':
raise ConflictError('Slot no longer available')
[Link] = 'booked'
[Link](update_fields=['status'])
return [Link](
slot_id=slot_id, patient_id=user_id,
idempotency_key=idempotency_key, status='confirmed'
)
6.3 Celery — Async Task Queue
Celery decouples slow operations from HTTP request-response cycle, executing them asynchronously in worker
processes.
Common use cases: sending emails/SMS, PDF generation, slow 3rd-party API calls, image/video processing,
scheduled jobs (celery-beat), fan-out notifications.
# [Link]
from celery import Celery
app = Celery('myapp', broker='redis://localhost:6379/0')
# [Link]
@[Link](bind=True, max_retries=3, default_retry_delay=60)
def send_confirmation(self, appointment_id: int):
try:
appt = [Link].select_related('patient').get(id=appointment_id)
EmailService.send_confirmation(appt)
except [Link]:
pass # no retry needed
except Exception as exc:
raise [Link](exc=exc, countdown=2 ** [Link])
# Trigger from view
send_confirmation.delay([Link])
Chapter 7 · DevOps & Docker
7.1 Docker
# Dockerfile — production-ready
FROM python:3.12-slim
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PIP_NO_CACHE_DIR=1
WORKDIR /app
# Install dependencies first (cache layer)
COPY [Link] .
RUN pip install --no-cache-dir -r [Link]
COPY . .
RUN python [Link] collectstatic --noinput
EXPOSE 8000
CMD ["gunicorn", "[Link]:application",
"--bind", "[Link]:8000",
"--workers", "4",
"--worker-class", "gthread",
"--threads", "2"]
7.2 Nginx Configuration
upstream django_app {
server [Link]:8000;
server [Link]:8001;
keepalive 32;
}
server {
listen 443 ssl http2;
server_name [Link];
gzip on;
gzip_types text/plain application/json application/javascript text/css;
location /static/ {
root /var/www/myapp;
expires 1y;
add_header Cache-Control 'public, immutable';
}
location / {
proxy_pass [Link]
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_read_timeout 90s;
}
}
Chapter 8 · Git
8.1 Essential Git Commands
Command What It Does
git add <file> / git add . Stage specific file or all changed files
git commit -m 'message' Create a commit with staged changes
git push origin branch Push local commits to remote repository
git pull origin branch Fetch + merge remote changes
git stash / git stash pop Temporarily save uncommitted changes / restore them
git reset --soft HEAD~1 Undo last commit, keep changes staged
git reset --hard HEAD~1 Undo last commit, DISCARD changes (DANGER)
git rebase main Reapply current branch commits on top of main (cleaner history)
git cherry-pick <hash> Apply a specific commit to current branch
git log --oneline --graph Visual commit history
git diff main..feature Show changes between branches
Interview Tip git rebase vs git merge: rebase creates a linear history (cleaner, preferred for feature branches).
merge preserves branch structure (better for tracking when things were integrated).
Chapter 9 · AI/ML Basics for Backend Engineers
9.1 Vector Databases & Embeddings
Vector databases store high-dimensional vector representations (embeddings) of data (text, images, audio). They
enable semantic similarity search — finding items that are conceptually similar, not just keyword-matching.
Vector DB Key Features
Pinecone Fully managed, serverless, fast similarity search at scale
FAISS (Facebook AI) Open source, in-memory, extremely fast for local use
Weaviate Open source, supports multimodal embeddings, GraphQL API
Chroma Lightweight, Python-native, great for prototyping
pgvector PostgreSQL extension — add vector search to existing Postgres
9.2 RAG — Retrieval Augmented Generation
RAG improves LLM responses by retrieving relevant context from a knowledge base before generating an
answer. Prevents hallucinations and enables domain-specific knowledge without retraining.
# RAG Pipeline (simplified)
# Step 1: Index your documents
from [Link] import OpenAIEmbeddings
from [Link] import Chroma
embeddings = OpenAIEmbeddings()
vectordb = Chroma.from_documents(documents, embeddings)
# Step 2: At query time — retrieve then generate
from [Link] import RetrievalQA
from langchain.chat_models import ChatOpenAI
qa_chain = RetrievalQA.from_chain_type(
llm=ChatOpenAI(model='gpt-4'),
retriever=vectordb.as_retriever(search_kwargs={'k': 5}),
)
result = qa_chain.run('What is our refund policy?')
# LLM answers using your documents, not hallucinated knowledge
9.3 LLMs & LangChain Overview
Concept Explanation
LLM (Large Language Model) Neural network trained on massive text data to understand and generate
natural language (GPT-4, Claude, Gemini)
Embedding Numerical vector representation of text — similar meanings → similar
vectors
Prompt Engineering Crafting inputs to guide LLM behavior — system prompts, few-shot
examples, chain-of-thought
LangChain Python framework to build LLM-powered apps — chains, agents, tools,
memory, RAG pipelines
Tokens LLM processes text as tokens (~3/4 of a word). Context window = max
tokens per request.
Fine-tuning Train a pre-trained LLM on domain-specific data to improve task
performance
Chapter 10 · Coding Practice Problems
10.1 Must-Know Patterns
LRU Cache Implementation
from collections import OrderedDict
class LRUCache:
def __init__(self, capacity: int):
[Link] = capacity
[Link] = OrderedDict() # preserves insertion order
def get(self, key: int) -> int:
if key not in [Link]:
return -1
[Link].move_to_end(key) # mark as recently used
return [Link][key]
def put(self, key: int, value: int) -> None:
if key in [Link]:
[Link].move_to_end(key)
[Link][key] = value
if len([Link]) > [Link]:
[Link](last=False) # remove least recently used
# Usage
cache = LRUCache(2)
[Link](1, 1) # cache: {1:1}
[Link](2, 2) # cache: {1:1, 2:2}
[Link](1) # returns 1, cache: {2:2, 1:1}
[Link](3, 3) # evicts key 2, cache: {1:1, 3:3}
Sliding Window — Max Sum Subarray of Size K
def max_sum_subarray(arr, k):
window_sum = sum(arr[:k])
max_sum = window_sum
for i in range(k, len(arr)):
window_sum += arr[i] - arr[i - k] # slide window
max_sum = max(max_sum, window_sum)
return max_sum
print(max_sum_subarray([2, 1, 5, 1, 3, 2], 3)) # 9
Find Duplicates in List
def find_duplicates(nums):
seen = set()
return [x for x in nums if x in seen or [Link](x)]
# Top K elements using heap
import heapq
from collections import Counter
def top_k_frequent(nums, k):
count = Counter(nums)
return [Link](k, [Link](), key=[Link])
print(top_k_frequent([1,1,1,2,2,3], 2)) # [1, 2]
Euclidean Distance
import math
def euclidean_distance(p1, p2):
return [Link](sum((a - b) ** 2 for a, b in zip(p1, p2)))
print(euclidean_distance((2, 3), (5, 7))) # 5.0
# NumPy version (faster for large arrays)
import numpy as np
p1 = [Link]([2, 3])
p2 = [Link]([5, 7])
print([Link](p1 - p2)) # 5.0
Merge Intervals
def merge_intervals(intervals):
[Link](key=lambda x: x[0])
merged = [intervals[0]]
for start, end in intervals[1:]:
if start <= merged[-1][1]: # overlapping
merged[-1][1] = max(merged[-1][1], end)
else:
[Link]([start, end])
return merged
print(merge_intervals([[1,3],[2,6],[8,10],[15,18]]))
# [[1, 6], [8, 10], [15, 18]]
Chapter 11 · Interview Q&A — 60+ Questions with
Answers
Python Questions
Q1. What is the GIL in Python? How does it affect concurrency?
➤ The GIL (Global Interpreter Lock) is a mutex in CPython that allows only one thread to execute Python
bytecode at a time. It simplifies memory management but limits CPU-bound multi-threading. Solution: use
multiprocessing for CPU-bound tasks (each process has its own GIL), threading for I/O-bound tasks (GIL
released during I/O), or asyncio for high-concurrency I/O.
Q2. What is the difference between list, tuple, set, and dict?
➤ list: mutable, ordered, allows duplicates — use for dynamic collections. tuple: immutable, ordered,
hashable — use for fixed data. set: mutable, unordered, no duplicates — use for membership tests and
deduplication (O(1)). dict: mutable, ordered (3.7+), key-value pairs — use for fast lookups by key (O(1)).
Q3. What is a decorator and how does it work?
➤ A decorator is a higher-order function that wraps another function to add behavior without modifying it. It
takes a function as argument and returns a new function. Python's @syntax is syntactic sugar: @decorator def
f(): ... is equivalent to f = decorator(f). Always use @[Link] to preserve the original function's
metadata.
Q4. What is a generator and when would you use it?
➤ A generator is a function that uses 'yield' to produce values lazily — one at a time, on demand. Unlike a
list, it doesn't store all values in memory. Use generators for: large datasets (reading huge files), infinite
sequences, pipeline processing. Memory-efficient: sum(x*x for x in range(1_000_000)) uses O(1) memory.
Q5. Explain closures and the common loop variable gotcha.
➤ A closure is an inner function that captures variables from its enclosing scope. Gotcha: all lambdas in
[lambda: i for i in range(3)] see the SAME variable 'i' (the final value 2). Fix: use default args [lambda i=i: i
for i in range(3)] to capture by value.
Q6. What is the difference between shallow copy and deep copy?
➤ Shallow copy ([Link] or [Link]()) creates a new outer container but shares references to inner
objects. Deep copy ([Link]()) creates a fully independent recursive copy. Danger: with shallow copy,
mutating a nested list affects both original and copy.
Q7. What are *args and **kwargs?
➤ *args collects extra positional arguments into a tuple. **kwargs collects extra keyword arguments into a
dict. They allow functions to accept variable numbers of arguments. They can also be used for unpacking:
func(*list_arg, **dict_arg).
Q8. What is the difference between @classmethod and @staticmethod?
➤ @classmethod receives the class (cls) as first argument — used for factory methods and class-state
modification. @staticmethod has no automatic first argument — it's a plain utility function namespaced
inside the class with no access to class or instance.
Q9. What is __slots__ and why use it?
➤ __slots__ replaces the per-object __dict__ with fixed C-level descriptors. Benefits: ~40-50% memory
reduction per instance, faster attribute access. Downside: cannot add new attributes dynamically. Use for
classes that create millions of instances (e.g., Point class in scientific computing).
Q10. How does Python's garbage collection work?
➤ CPython uses reference counting as primary mechanism — when an object's refcount drops to 0, it's freed
immediately. The gc module adds a cycle detector to handle circular references (A→B→A), using three
generations for collection with short-lived objects collected most frequently.
Django Questions
Q11. What happens when a request hits Django?
➤ 1) WSGI/ASGI server wraps TCP data in HttpRequest. 2) Request Middleware stack processes top-to-
bottom. 3) URLResolver matches URL pattern and extracts params. 4) View executes business logic. 5) View
returns HttpResponse. 6) Response Middleware processes bottom-to-top. 7) Server sends bytes to client.
Q12. Explain select_related vs prefetch_related.
➤ select_related uses SQL JOIN — fetches related objects in ONE query. Use for ForeignKey and
OneToOneField. prefetch_related makes 2 separate queries then joins in Python. Use for ManyToManyField
and reverse ForeignKey. Both solve the N+1 problem. The Prefetch() object lets you customize the prefetch
queryset.
Q13. What is the N+1 query problem and how do you fix it?
➤ N+1 occurs when you fetch N objects then make 1 extra query per object to access a related field.
Example: fetching 100 orders and accessing [Link] for each = 101 queries. Fix: use select_related for
FK (1 query with JOIN) or prefetch_related for M2M (2 queries).
Q14. What are Django signals? When to use and avoid them?
➤ Signals allow decoupled components to react when actions occur (post_save, pre_delete). Good for:
cross-app communication (User→Profile creation), cache invalidation. Avoid for: same-app logic where
explicit service calls are clearer and more testable. Signals create hidden coupling that's hard to debug and
test.
Q15. Explain F() and Q() expressions.
➤ F() references a DB column value without loading it into Python — enables atomic DB-side operations
(no race conditions). Example: [Link](stock=F('stock')-1). Q() enables complex OR/NOT
queries: filter(Q(is_staff=True)|Q(is_superuser=True)) or filter(~Q(groups__name='admin')).
Q16. How does Django's ORM handle transactions?
➤ @[Link] wraps code in a DB transaction — any exception causes full rollback.
select_for_update() adds pessimistic row-level locking (other transactions wait). Nested atomic() blocks
create savepoints — inner failures only roll back to the savepoint, not the outer block.
Q17. What is Django middleware? Give examples.
➤ Middleware wraps every request/response globally. Each middleware is a class with __call__ that calls
get_response(). Request flows top→bottom, response flows bottom→top through MIDDLEWARE list.
Examples: SecurityMiddleware (HTTPS, HSTS), SessionMiddleware, AuthMiddleware,
CsrfViewMiddleware. Use for: logging, rate limiting, auth, CORS, request ID injection.
Q18. How do you optimize a slow Django API?
➤ 1) Profile — Django Debug Toolbar or django-silk to find the bottleneck. 2) Fix N+1 queries with
select_related/prefetch_related. 3) Add DB indexes on filtered/ordered columns. 4) Cache with Redis (cache-
aside pattern). 5) Use bulk_create/bulk_update instead of looping .save(). 6) Offload slow tasks to Celery. 7)
Fetch only needed columns with .only() or .values().
Q19. Explain Django's custom User model. Why define it at project start?
➤ Extend AbstractUser to add fields (phone, avatar, bio). Set AUTH_USER_MODEL = '[Link]' in
[Link] BEFORE first migration. Changing it later requires recreating all migrations and data migration
— extremely painful. Always do it at project creation even if you don't need custom fields yet.
Q20. What is the difference between ForeignKey, OneToOneField, and ManyToManyField?
➤ ForeignKey: many-to-one (many books can have one author). OneToOneField: one-to-one (each user has
exactly one profile). ManyToManyField: many-to-many (books can be in many stores, stores have many
books) — creates a junction table automatically.
DRF Questions
Q21. What is the difference between Serializer and ModelSerializer?
➤ Serializer requires manually defining every field. ModelSerializer auto-generates fields from the model
(like ModelForm). ModelSerializer is preferred for standard CRUD. Use plain Serializer when you need full
control over validation/representation not tied to a specific model.
Q22. How do DRF permissions work?
➤ Permission classes implement has_permission() (view-level, checked before view runs) and
has_object_permission() (object-level, checked when get_object() is called). Multiple classes are ANDed —
all must return True. Custom: inherit BasePermission and override the methods.
Q23. What is JWT authentication and how does token refresh work?
➤ JWT (JSON Web Token) has 3 parts: [Link]. Access token is short-lived (15 min) —
sent with each request in Authorization: Bearer header. Refresh token is long-lived (7 days) — used only to
get new access tokens. Rotation: each refresh generates a new refresh token and blacklists the old one.
Q24. What is DRF throttling?
➤ Throttling limits how many requests a user or IP can make in a time period. AnonRateThrottle: limits
anonymous users by IP. UserRateThrottle: limits authenticated users. ScopedRateThrottle: different limits per
view. Configure rates in DEFAULT_THROTTLE_RATES: {'anon': '100/day', 'user': '1000/day'}.
FastAPI Questions
Q25. FastAPI async def vs def — what's the difference?
➤ 'async def' endpoints run in the event loop — ideal for async I/O (httpx, asyncpg, aioredis). 'def' endpoints
run in a thread pool — use for sync libraries (psycopg2, requests) or CPU-bound work. FastAPI handles both
correctly. Never use 'async def' with blocking sync calls (like [Link]) — use [Link] instead.
Q26. What is dependency injection in FastAPI?
➤ Dependencies are callables declared with Depends() in function signatures. FastAPI resolves and injects
them automatically. Can chain: get_current_user depends on get_db and oauth2_scheme. Used for: DB
sessions, auth, shared logic. Benefits: reusable, testable, automatic cleanup (yield-based).
Q27. How does Pydantic validation work in FastAPI?
➤ FastAPI automatically validates request bodies against Pydantic models. If validation fails, it returns a
422 Unprocessable Entity with detailed error info. Validators can be field-level (@validator) or cross-field
(@model_validator). Pydantic v2 uses model_config = {'from_attributes': True} instead of orm_mode.
MySQL & Database Questions
Q28. What is an index in MySQL and why is it important?
➤ An index is a data structure (B-tree by default) that allows MySQL to find rows without a full table scan.
Without an index: O(n) scan. With an index: O(log n) lookup. Add indexes on columns used in WHERE,
JOIN ON, ORDER BY, GROUP BY. Too many indexes slow down INSERT/UPDATE/DELETE — balance
is key.
Q29. Explain INNER JOIN vs LEFT JOIN.
➤ INNER JOIN returns only rows with matching keys in BOTH tables. LEFT JOIN returns all rows from
the left table plus matching rows from the right (NULL if no match). Use LEFT JOIN when you want to
include records from the left table even if they have no related records on the right.
Q30. What are ACID properties?
➤ Atomicity: all operations succeed or all are rolled back. Consistency: DB moves from one valid state to
another (constraints always hold). Isolation: concurrent transactions don't interfere with each other.
Durability: committed data survives crashes. InnoDB is ACID-compliant. MyISAM is not (no transactions).
Q31. What is the difference between WHERE and HAVING?
➤ WHERE filters rows BEFORE aggregation (cannot reference aggregate functions). HAVING filters
groups AFTER aggregation (can reference COUNT, SUM, etc.). Example: WHERE filters individual rows,
HAVING COUNT(*) > 5 filters only groups with more than 5 members.
Q32. How do you use EXPLAIN to optimize a query?
➤ EXPLAIN SELECT ... shows query execution plan. Key columns: type (ALL=full scan, index=index
scan, ref/eq_ref=good), key (index used), rows (estimated rows scanned), Extra (Using index=covering
index, Using filesort=bad for performance). Aim for type=ref or better, avoid type=ALL on large tables.
System Design Questions
Q33. How would you scale a Django API to handle 1 million users?
➤ Profile first. Then: CDN (Cloudflare) absorbs 70-80% traffic. Load balancer (Nginx) distributes to auto-
scaled stateless app servers. Redis cluster for sessions, rate limiting, hot data cache. PgBouncer for DB
connection pooling. Read replicas offload read traffic. Celery for async tasks. Kubernetes HPA for auto-
scaling by CPU/RPS.
Q34. How do you prevent race conditions in a booking system?
➤ Options: 1) Pessimistic locking — select_for_update() locks the row; other transactions wait (safe but can
cause contention). 2) Optimistic locking — add a version column; retry if UPDATE affects 0 rows. 3) F() for
atomic DB operations. 4) DB UNIQUE constraint — rejects duplicate inserts (handle IntegrityError). 5)
Redis SETNX for distributed locks.
Q35. What is Celery and when would you use it?
➤ Celery is a distributed task queue that decouples slow operations from the HTTP cycle. Tasks run
asynchronously in worker processes. Use for: sending emails/SMS, PDF generation, 3rd-party API calls,
image processing, scheduled jobs (celery-beat), fan-out notifications. Broker: Redis or RabbitMQ. Backend:
Redis (for task results).
Q36. Explain idempotency in API design.
➤ An idempotent operation produces the same result whether called once or many times. GET, PUT,
DELETE are naturally idempotent. POST is not. For POST operations that must be safe to retry (payment,
booking): client generates a unique idempotency key, server stores it and returns the same result on duplicate
requests.
Q37. How do you debug a slow API?
➤ 1) Check response time breakdown — is it slow in DB, network, or application? 2) Django Debug
Toolbar or silk for SQL query analysis. 3) EXPLAIN ANALYZE on slow queries. 4) Check for N+1 queries.
5) Look for missing indexes. 6) Profile with cProfile or py-spy for CPU bottlenecks. 7) Add logging for
external API call durations. 8) Cache frequent, expensive results.
Q38. Django vs FastAPI — when to choose which?
➤ Django: full-featured, batteries-included (admin, ORM, auth, migrations, forms). Best for traditional
CRUD apps, CMS, monoliths, teams needing quick productivity. FastAPI: async-first, lightweight, auto
OpenAPI docs, excellent type safety. Best for high-throughput microservices, ML model serving, async-
heavy workloads. Rule: default to Django, choose FastAPI when async performance matters most.
Concurrency & Performance Questions
Q39. When to use threading vs multiprocessing vs asyncio?
➤ threading: I/O-bound tasks (network calls, DB queries, file I/O) — GIL released during I/O, threads
benefit. multiprocessing: CPU-bound tasks (math, image processing, ML inference) — each process has own
GIL, true parallelism. asyncio: high-concurrency I/O (thousands of simultaneous connections) — single-
threaded, no thread overhead.
Q40. What is connection pooling and why is it important?
➤ Connection pooling maintains a pool of pre-established DB connections that requests can reuse. Opening
a new connection is expensive (~50-100ms). Pooling reduces this overhead. PgBouncer for PostgreSQL,
django-db-connection-pool for Django. Without pooling, 1000 concurrent requests = 1000 DB connections
(most DBs limit to ~100-500).
REST API & Design Questions
Q41. What are REST principles?
➤ Stateless: each request contains all info needed (no server-side sessions). Client-Server: separation of
concerns. Uniform Interface: consistent URL patterns and HTTP methods. Resource-based: URLs represent
resources (nouns), HTTP methods represent actions (verbs). Cacheable: responses can be cached. Layered
System: client doesn't know about backend layers.
Q42. Explain REST HTTP status codes.
➤ 200 OK: success. 201 Created: POST succeeded (include Location header). 204 No Content: DELETE
succeeded. 400 Bad Request: malformed input. 401 Unauthorized: auth required. 403 Forbidden:
authenticated but not authorized. 404 Not Found. 409 Conflict: double booking, duplicate. 422
Unprocessable: validation errors. 429 Too Many Requests. 500 Internal Server Error.
Q43. What is pagination and how do you implement it?
➤ Pagination splits large result sets into pages. Three main strategies: 1) Offset pagination
(LIMIT/OFFSET) — simple but slow for large offsets. 2) Cursor pagination — uses last item's ID/timestamp
as cursor — consistent, handles insertions. 3) Keyset pagination — efficient for large datasets. DRF built-in:
PageNumberPagination, CursorPagination, LimitOffsetPagination.
DevOps Questions
Q44. What is the difference between a Docker image and container?
➤ Image: read-only blueprint (like a class) — built from Dockerfile, stored in registry. Container: running
instance of an image (like an object) — isolated process with own filesystem, network, resources. Many
containers can run from one image. 'docker build' creates images, 'docker run' creates containers.
Q45. What is Gunicorn and why use it with Django?
➤ Gunicorn is a production WSGI server. Django's development server is single-threaded and not suitable
for production. Gunicorn creates multiple worker processes to handle concurrent requests. --workers 4 creates
4 processes (rule of thumb: 2×CPU cores + 1). Use --worker-class gthread for threads-per-worker. Nginx sits
in front as reverse proxy + static file server.
Git Questions
Q46. What is the difference between git rebase and git merge?
➤ git merge: creates a merge commit that preserves branch history (good for tracking when features were
integrated). git rebase: replays commits on top of another branch — creates linear history (cleaner, easier to
read). Golden rule: never rebase shared/public branches. Use rebase for local feature branches before merging
to main.
Q47. How do you undo a commit in Git?
➤ git reset --soft HEAD~1: undo commit, keep changes staged. git reset --mixed HEAD~1: undo commit,
keep changes unstaged (default). git reset --hard HEAD~1: undo commit, DISCARD changes (DANGER —
loses work). git revert HEAD: creates new commit that reverses the changes (safe for shared branches —
history preserved).
NumPy & Data Questions
Q48. What is broadcasting in NumPy?
➤ Broadcasting allows NumPy to perform operations between arrays of different shapes without explicit
looping. Rules: 1) Dimensions are compared from trailing end. 2) Dimensions are compatible if equal or one
is 1. Example: adding (3,3) matrix + (3,) vector — vector is broadcast across rows. Much faster than Python
loops.
Q49. Explain [Link]() vs reshape().
➤ ravel() returns a flattened 1D array (may return a view). reshape() returns array with new shape without
changing data. Both are O(1) when possible (just change strides). Use ravel() for 1D flatten, reshape(-1) for
same result. Key difference: reshape can specify any valid shape; ravel always gives 1D.
Q50. How do you remove NaN values from a NumPy array?
➤ arr[~[Link](arr)] — boolean mask removes NaNs. np.nan_to_num(arr) replaces NaN with 0 (and inf
with large number). For row-wise mean ignoring NaN: [Link](arr, axis=1). For pandas: [Link]() or
[Link](value).
Advanced & Mixed Questions
Q51. What is the difference between authentication and authorization?
➤ Authentication: verifying WHO you are (login, JWT token verification). Authorization: verifying what
you're ALLOWED to do (permissions, roles). In Django: authentication middleware sets [Link]. DRF
permission classes handle authorization (IsAuthenticated, IsAdminUser, custom HasRole).
Q52. How does LRU cache work and what is its time complexity?
➤ LRU (Least Recently Used) evicts the least recently accessed item when capacity is exceeded.
Implementation: OrderedDict (O(1) move_to_end) or doubly-linked list + hash map. All operations (get, put)
are O(1). Python's functools.lru_cache uses this internally. Use when: expensive computations with repeated
inputs.
Q53. What is a context manager and give a practical use case?
➤ Context managers handle resource acquisition and release via __enter__/__exit__. Guaranteed cleanup
even if exceptions occur. Practical examples: file I/O (auto-close), DB connections (auto-commit/rollback),
threading locks (auto-release), timer for profiling. Use @contextmanager decorator for generator-based
implementation.
Q54. Explain the difference between sync and async in Python.
➤ Synchronous: code executes sequentially — each line waits for the previous to complete. Blocking I/O
wastes CPU waiting. Asynchronous (asyncio): uses an event loop with coroutines. When a coroutine hits
await (I/O operation), it suspends and the event loop runs another coroutine. Single thread handles thousands
of I/O operations concurrently.
Q55. How would you implement rate limiting?
➤ Options: 1) DRF built-in throttling (AnonRateThrottle, UserRateThrottle). 2) Nginx limit_req module
(IP-based). 3) Redis-based: use sliding window counter (INCR + EXPIRE) or token bucket algorithm. 4)
django-ratelimit library. Return 429 Too Many Requests with Retry-After header. Rate limits: per IP (for
anonymous), per user (for authenticated), per endpoint.
Q56. What is connection pooling vs connection per request?
➤ Connection per request: open new DB connection for each HTTP request — expensive (50-100ms
handshake), can exhaust DB connection limit. Connection pooling: maintain pool of reusable connections —
requests borrow and return connections. PgBouncer (external) or django-db-connection-pool. Essential for
production with 100+ concurrent requests.
Q57. Explain Python's asyncio event loop.
➤ The event loop is a single-threaded scheduler that runs coroutines. When a coroutine awaits I/O, it's
suspended and the loop runs another ready coroutine. When I/O completes, the coroutine is resumed.
[Link]() runs multiple coroutines concurrently (not in parallel). [Link]() creates and manages the
event loop.
Q58. How do you handle database migrations safely in production?
➤ 1) Never run migrations during peak traffic. 2) For large tables: add columns with DEFAULT first (non-
blocking), then populate, then add constraint. 3) Use --fake to mark already-applied migrations. 4) Test on
staging with production data volume first. 5) Have a rollback plan. 6) Some operations (renaming
columns/tables) have no safe migration — requires multi-step with backward compatibility.
Q59. What is the difference between Redis and Memcached?
➤ Redis: persistent (optional), supports complex data structures (lists, sets, sorted sets, hashes), pub/sub,
Lua scripting, clustering, replication. Memcached: pure cache only (no persistence), simple key-value, multi-
threaded (can use multiple cores), simpler. Choose Redis for: sessions, queues, leaderboards, pub/sub,
persistence. Memcached only if you need maximum throughput for simple caching.
Q60. How do you test a Django API effectively?
➤ 1) Unit tests: test model methods, serializers, utility functions in isolation. 2) Integration tests: use
APIClient (DRF) to test full view→serializer→DB cycle. 3) Use [Link] to create fixtures. 4) Use
@patch to mock external APIs (requests, Celery tasks). 5) Use factory_boy for test data. 6) pytest-django for
pytest integration. 7) Test edge cases: empty data, invalid input, permission denied, rate limits.
Interview Strategy & Revision Plan
Most Asked Topics — Priority Order
Priority Topics — Focus Here First
Must Know GIL, list vs tuple vs dict vs set, decorators, generators, closures
Must Know Django ORM — N+1, select_related, prefetch_related, F(), Q()
Must Know MySQL indexes, JOIN types, ACID properties
Must Know DRF — serializers, JWT auth, permissions, throttling
Important Django signals, middleware, transactions, custom managers
Important FastAPI — async/sync, Pydantic, dependency injection
Important System design — caching, Celery, load balancing, rate limiting
Good to Know Git rebase vs merge, Docker basics, Nginx
Good to Know AI/ML basics — embeddings, RAG, LangChain
5-Topic 80% Rule
These 5 topics appear in approximately 80% of backend Python interviews:
14. Python internals (GIL, mutability, memory, decorators, generators)
15. Django ORM (N+1 fix, select_related, prefetch_related, transactions)
16. API optimization (caching, Celery, rate limiting, DB indexes)
17. MySQL indexes and query optimization
18. FastAPI async patterns and dependency injection
Final Tip In interviews, always explain WHY before HOW. Don't just say 'use select_related' — say 'to solve the
N+1 problem by converting N+1 queries into a single JOIN'. This shows you understand the problem, not just the
solution.
Python Backend Interview Guide • Page of