0% found this document useful (0 votes)

26 views54 pages

AI ML Python Study Guide

The document is a comprehensive study guide for Python and AI/ML, covering topics from core Python concepts to advanced libraries like NumPy, Pandas, and Matplotlib. It includes sections on data structures, object-oriented programming, and practical applications in machine learning, along with interview preparation materials. The guide is suitable for beginners to advanced learners and emphasizes real-world use cases in AI/ML.

Uploaded by

mambiswas769

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views54 pages

AI ML Python Study Guide

Uploaded by

mambiswas769

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Python & AI/ML

Comprehensive Study Guide

NumPy · Pandas · Matplotlib

From Basics to Advanced — AI/ML Focused

Covers: Core Python · NumPy · Pandas · Matplotlib

Data Structures · OOP · Generators · Decorators
Vectorization · Feature Engineering · Visualization

2024 Edition | Beginner → Advanced | Interview-Ready

Table of Contents
1. Python — Core to Advanced
1.1 Basics: Syntax, Variables, Data Types, Operators
1.2 Control Flow: Loops & Conditionals
1.3 Functions (all variants incl. Decorators)
1.4 Data Structures (List, Tuple, Set, Dict)
1.5 Object-Oriented Programming
1.6 File & Exception Handling
1.7 Modules, Packages, Virtual Environments
1.8 Iterators & Generators
1.9 Functional Programming
1.10 Memory Management & Performance
1.11 Python in the AI/ML Ecosystem

2. NumPy — Basic to Advanced

2.1 ndarray Creation, Indexing, Slicing
2.2 Broadcasting & Vectorization
2.3 Mathematical & Linear Algebra Operations
2.4 Random Module
2.5 Advanced Indexing & Masking
2.6 Memory Efficiency & Performance
2.7 Role of NumPy in ML

3. Pandas — Basic to Advanced

3.1 Series & DataFrame
3.2 Data Loading (CSV, Excel, JSON)
3.3 Data Cleaning & Missing Values
3.4 Filtering, Selection, GroupBy
3.5 Merge, Join, Concat
3.6 Time Series & Feature Engineering
3.7 Performance Tips & ML Pipelines

4. Matplotlib — Basic to Advanced

4.1 Line, Bar, Scatter, Histogram
4.2 Customization & Subplots
4.3 Advanced Visualizations
4.4 Saving/Exporting & ML Use Cases

5. End-to-End Mini Project

5.1 Data → Preprocessing → Visualization → Insights

6. Reference Tables & Interview Guide

6.1 Complexity Cheat Sheet
6.2 Common Pitfalls
6.3 Top Interview Questions & Answers
Section 1 — Python: Core to Advanced

1.1 Basics: Syntax, Variables, Data Types, Operators

Python is an interpreted, high-level, dynamically-typed language. Its clean syntax makes it the dominant
language in data science and AI/ML.

Variables & Dynamic Typing

Python uses duck typing — the type is determined at runtime. Variable names are case-sensitive and should
follow snake_case convention.

# Variable assignment — no type declaration needed

x = 10 # int

name = 'Alice' # str

pi = 3.14159 # float

flag = True # bool

data = None # NoneType

# Multiple assignment

a, b, c = 1, 2, 3

x = y = z = 0

# Type checking

print(type(x)) # <class 'int'>

print(isinstance(x, int)) # True

# Type conversion

s = str(42) # '42'

n = int('100') # 100

f = float('3.14') # 3.14

Core Data Types

Type Example Mutable? AI/ML Use

int x = 42 No Index, count, label

float x = 3.14 No Weights, probabilities

complex x = 2+3j No Signal processing, FFT

str x = 'hello' No Text, feature names

bool x = True No Masks, flags

list x = [1,2,3] Yes Dataset samples

tuple x = (1,2) No Fixed configs, coords

dict x = {'a':1} Yes Hyperparameters, configs

set x = {1,2,3} Yes Unique classes, vocab

NoneType x = None — Missing values

Operators
# Arithmetic

5 + 3, 5 - 3, 5 * 3, 5 / 3 # 8, 2, 15, 1.666...

5 // 3 # 1 (floor division — common in ML indexing)

5 % 3 # 2 (modulo)

2 ** 10 # 1024 (power — used in learning rate schedules)

# Comparison → returns bool

x == y, x != y, x > y, x >= y

# Logical

True and False # False

True or False # True

not True # False

# Bitwise (used in masks, GPU kernels)

0b1010 & 0b1100 # 0b1000 = 8

0b1010 | 0b1100 # 0b1110 = 14

1 << 3 # 8 (left shift)

# Identity & membership

x is None # identity check (prefer over == None)

3 in [1, 2, 3] # True

5 not in [1,2,3] # True

■■ In AI/ML, use 'is None' (identity) rather than '== None' to check for missing values. The == operator
can be overloaded by NumPy/pandas arrays and raise ambiguous truth-value errors.

1.2 Control Flow: Loops & Conditionals

# if / elif / else

score = 0.87

if score >= 0.90:

print('Excellent')

elif score >= 0.75:

print('Good') # ← this runs

else:

print('Needs improvement')

# Ternary expression

label = 'positive' if score > 0.5 else 'negative'

# for loop with range

for epoch in range(1, 6):

print(f'Epoch {epoch}')

# Enumerate — index + value (very common in ML training loops)

dataset = [0.1, 0.4, 0.7]

for idx, val in enumerate(dataset):

print(idx, val)

# zip — iterate multiple iterables in parallel

preds = [0, 1, 1]

labels= [0, 1, 0]

for p, l in zip(preds, labels):

print('pred:', p, 'label:', l)

# while loop

loss = 1.0
while loss > 0.01:

loss *= 0.9

# List comprehension (Pythonic, faster than map+lambda for simple ops)

squares = [x**2 for x in range(10)]

evens = [x for x in range(20) if x % 2 == 0]

# Dict comprehension

word_len = {w: len(w) for w in ['hello', 'world', 'ai']}

# break / continue / pass

for x in range(100):

if x == 5: break # early exit

if x % 2: continue # skip odd

pass # no-op placeholder

■ List comprehensions are ~30% faster than equivalent for-loops with append() in CPython because they
avoid repeated LOAD_ATTR lookups on the list object.

1.3 Functions

Definition, Scope, Default & Keyword Arguments

# Basic function

def greet(name, greeting='Hello'): # 'greeting' has a default

return f'{greeting}, {name}!'

greet('Alice') # Hello, Alice!

greet('Bob', 'Hi') # Hi, Bob!

greet(greeting='Hey', name='Carol') # keyword args — order-free

# *args — variable positional arguments (tuple inside)

def sum_all(*args):

return sum(args)

sum_all(1, 2, 3, 4) # 10

# **kwargs — variable keyword arguments (dict inside)

def build_model(**kwargs):

for k, v in [Link]():

print(f'{k}: {v}')

build_model(layers=3, lr=0.001, dropout=0.2)

# Combined signature — order matters: pos, *args, kw-only, **kwargs

def train(data, *callbacks, epochs=10, **hyperparams):

pass

Lambda Functions
# Lambda = anonymous single-expression function

square = lambda x: x ** 2

square(5) # 25

# Very useful with sorted(), map(), filter()

students = [('Alice', 88), ('Bob', 95), ('Carol', 72)]

sorted_students = sorted(students, key=lambda s: s[1], reverse=True)

# [('Bob', 95), ('Alice', 88), ('Carol', 72)]

# In pandas: apply a lambda to a column

# df['score_norm'] = df['score'].apply(lambda x: (x - [Link]()) / [Link]())

Recursion
# Factorial

def factorial(n):

if n <= 1: return 1 # base case

return n * factorial(n - 1) # recursive case

# Tree traversal (common in decision trees)

def dfs(node):

if node is None: return

print([Link])

dfs([Link])

dfs([Link])
# Python default recursion limit = 1000

import sys

[Link](10000) # increase if needed

Decorators — With Real-World Use Cases

A decorator is a function that wraps another function to extend its behaviour without modifying it. Heavily
used in ML frameworks (Flask routes, TensorFlow, PyTorch).

import time, functools

# ■■ Basic decorator ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

def timer(func):

@[Link](func) # preserves name, doc

def wrapper(*args, **kwargs):

start = time.perf_counter()

result = func(*args, **kwargs)

end = time.perf_counter()

print(f'{func.name} ran in {end-start:.4f}s')

return result

return wrapper

@timer

def train_model(epochs):

[Link](0.1)

train_model(10) # train_model ran in 0.1002s

# ■■ Logger decorator ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

def logger(func):

@[Link](func)

def wrapper(*args, **kwargs):

print(f'Calling {func.name} with {args}, {kwargs}')

return func(*args, **kwargs)

return wrapper

# ■■ Decorator with arguments (factory pattern) ■■■

def retry(max_attempts=3):

def decorator(func):

@[Link](func)

def wrapper(*args, **kwargs):

for attempt in range(max_attempts):

try:

return func(*args, **kwargs)

except Exception as e:

if attempt == max_attempts - 1: raise

print(f'Attempt {attempt+1} failed: {e}')

return wrapper

return decorator

@retry(max_attempts=5)

def fetch_data(url): ...

# ■■ Stacking decorators ■■■■■■■■■■■■■■■■■■■■■■■■■■

@timer

@logger

def predict(x): return x * 2 # logger runs first, then timer

# ■■ Class-based decorator ■■■■■■■■■■■■■■■■■■■■■■■■

class Memoize:

def init(self, func):

[Link] = func

[Link] = {}

def call(self, *args):

if args not in [Link]:

[Link][args] = [Link](*args)

return [Link][args]

@Memoize

def fib(n):
if n < 2: return n

return fib(n-1) + fib(n-2)

■ Real-world ML decorators: @[Link] (JIT compilation in TensorFlow), @torch.no_grad() (disable

grad tracking during inference), @[Link]() (Flask/FastAPI endpoint registration), @property (getters in
model classes).

1.4 Data Structures

List — Ordered, Mutable, Allows Duplicates

lst = [1, 2, 3, 4, 5]

# Indexing & slicing

lst[0] # 1 (first)

lst[-1] # 5 (last)

lst[1:4] # [2,3,4]

lst[::2] # [1,3,5] (step=2)

lst[::-1] # [5,4,3,2,1] (reverse)

# Common methods

[Link](6) # [1,2,3,4,5,6]

[Link]([7,8]) # extend from iterable

[Link](0, 0) # insert at index

[Link](3) # remove first occurrence

[Link]() # remove & return last

[Link](0) # remove & return index 0

[Link]() # in-place sort O(n log n)

sorted(lst) # new sorted list

[Link]() # in-place reverse

[Link](4) # find index

[Link](2) # count occurrences

# List of lists (2D — used for matrices before NumPy)

matrix = [[1,2,3],[4,5,6],[7,8,9]]

matrix[1][2] # 6
# Flatten with comprehension

flat = [x for row in matrix for x in row]

Operation List Complexity Notes

Append [Link](x) O(1) amortized Fast — common in loops

Insert at i [Link](i,x) O(n) Shifts elements

Access by index lst[i] O(1) Direct memory offset

Search x in lst O(n) Use set for O(1) lookup

Delete by index del lst[i] O(n) Shifts elements

Sort [Link]() O(n log n) Tim sort (stable)

Slice copy lst[a:b] O(b-a) Creates new list

Tuple — Ordered, Immutable

t = (1, 2, 3)

t = 1, 2, 3 # parentheses optional

single = (42,) # single-element needs trailing comma!

# Useful as dict keys (lists can't be dict keys)

coords = {(0,0): 'origin', (1,0): 'right'}

# Named tuple — self-documenting, used in dataset pipelines

from collections import namedtuple

Sample = namedtuple('Sample', ['features', 'label'])

s = Sample(features=[0.1, 0.9], label=1)

print([Link]) # 1

# Tuple unpacking

x, y, z = (10, 20, 30)

first, *rest = (1, 2, 3, 4, 5) # rest = [2,3,4,5]

Dictionary — Key-Value, Ordered (Python 3.7+), Mutable

d = {'lr': 0.001, 'batch': 32, 'epochs': 100}

# Access & defaults

d['lr'] # 0.001 — KeyError if missing

[Link]('dropout', 0.0) # 0.0 — safe default

# Modification

d['optimizer'] = 'adam'

[Link]({'lr': 0.0001, 'weight_decay': 1e-5})

# Iteration

for key in d: pass # iterate keys

for val in [Link](): pass

for k, v in [Link](): pass # most common

# Dict comprehension

squared = {x: x**2 for x in range(5)}

# Merge dicts (Python 3.9+)

merged = d1 | d2

# defaultdict — avoids KeyError

from collections import defaultdict

word_count = defaultdict(int)

for word in [Link]():

word_count[word] += 1

# Counter — instant frequency count

from collections import Counter

c = Counter(['cat','dog','cat','bird','cat'])

c.most_common(2) # [('cat',3), ('dog',1)]

# OrderedDict (legacy — dicts are ordered since 3.7)

from collections import OrderedDict

Set — Unordered, Unique Elements

s = {1, 2, 3, 4}

[Link](5) # {1,2,3,4,5}

[Link](6) # no error if missing (vs remove which raises)

# Set operations (useful for NLP / class analysis)

a = {1, 2, 3, 4}

b = {3, 4, 5, 6}

a | b # union {1,2,3,4,5,6}

a & b # intersect {3,4}

a - b # difference {1,2}

a ^ b # symmetric diff {1,2,5,6}

# O(1) membership test — much faster than list for large sets

vocab = set(words) # build once

'hello' in vocab # O(1)

■ Interview: Why is set lookup O(1)? — Sets are implemented as hash tables. The hash of an element
directly maps to a bucket, avoiding linear scan.

1.5 Object-Oriented Programming

# ■■ Class definition ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

class NeuralNetwork:

# Class attribute (shared across all instances)

framework = 'PyTorch'

def init(self, layers, lr=0.001): # constructor

[Link] = layers # instance attribute

[Link] = lr

self._weights = None # '_' = private convention

self.bias = 0.0 # '' = name-mangled

def repr(self): # developer-facing string

return f'NeuralNetwork(layers={[Link]}, lr={[Link]})'

def str(self): # user-facing string

return f'{[Link]} model with {len([Link])} layers'

def forward(self, x):

return x # placeholder

@classmethod

def from_config(cls, config: dict):

return cls(config['layers'], config['lr'])

@staticmethod

def relu(x): # no self/cls — utility method

return max(0, x)

@property

def num_params(self):

return sum(l['units'] for l in [Link])

# ■■ Inheritance ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

class CNN(NeuralNetwork): # CNN inherits NeuralNetwork

def init(self, layers, lr, kernel_size=3):

super().init(layers, lr) # call parent init

self.kernel_size = kernel_size

def forward(self, x): # method override (polymorphism)

print(f'CNN forward with kernel {self.kernel_size}')

return super().forward(x)

# ■■ Multiple inheritance ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

class Trainable:

def fit(self, X, y): pass

class Serializable:

def save(self, path): pass

class Model(NeuralNetwork, Trainable, Serializable):

pass # MRO (Method Resolution Order) defines lookup order

print(Model.mro) # shows resolution chain

# ■■ Dunder (magic) methods ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

class Dataset:

def init(self, data):

[Link] = data

def len(self): return len([Link])

def getitem(self, i): return [Link][i]

def iter(self): return iter([Link])

def contains(self, x): return x in [Link]

ds = Dataset([1,2,3,4,5])

len(ds) # 5

ds[2] # 3

for x in ds: pass

3 in ds # True

OOP Pillar Concept Python Mechanism ML Example

Model weights hidden

Encapsulation Hide internal state _attr, __attr, @property
from user

Reuse parent CNNLayer extends

Inheritance class Child(Parent):
behaviour Layer

Same interface, diff Method override / duck [Link]() for

Polymorphism
impl typing CNN/RNN

Abstraction Hide complexity ABC, abstract methods sklearn BaseEstimator

1.6 File Handling & Exception Handling

# ■■ File I/O ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

# Always use context manager (with) — auto-closes file

with open('[Link]', 'r', encoding='utf-8') as f:

content = [Link]() # whole file as string

with open('[Link]', 'r') as f:

lines = [Link]() # list of lines

with open('[Link]', 'w') as f:

[Link]('Hello, AI!\n')

with open('[Link]', 'a') as f: # append mode

[Link]('epoch 10 done\n')

# ■■ pathlib (modern, preferred) ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

from pathlib import Path

p = Path('data/models')

[Link](parents=True, exist_ok=True)

for f in [Link]('*.pt'): # glob pattern matching

print([Link], [Link])

# ■■ Exception handling ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

try:

model = load_model('[Link]')

preds = [Link](X)

except FileNotFoundError as e:

print(f'Model not found: {e}')

except ValueError as e:

print(f'Bad input shape: {e}')

except Exception as e: # catch-all (use sparingly)

print(f'Unexpected error: {e}')

raise # re-raise to not swallow

else:

print('Prediction succeeded') # runs if no exception

finally:

print('Cleanup') # always runs

# Custom exceptions

class DataValidationError(ValueError):

def init(self, msg, column=None):

super().__init__(msg)

[Link] = column
raise DataValidationError('NaN found', column='age')

1.7 Modules, Packages & Virtual Environments

# Import styles

import numpy # standard — use [Link]()

import numpy as np # aliased — use [Link]()

from numpy import array # direct import

from numpy import * # wildcard — AVOID (pollutes namespace)

# name guard — code only runs when file executed directly

if __name__ == '__main__':

main()

# Package structure

# mypackage/

# ■■■ init.py (makes it a package)

# ■■■ [Link]

# ■■■ models/

# ■ ■■■ __init__.py

# ■ ■■■ [Link]

# Virtual environments

# python -m venv venv # create

# source venv/bin/activate # activate (Linux/Mac)

# venv\Scripts\activate # activate (Windows)

# pip install numpy pandas # install packages

# pip freeze > [Link] # save deps

# pip install -r [Link] # restore

# Useful standard library modules for ML

import os, sys, json, csv, re

from datetime import datetime

from collections import defaultdict, Counter, deque

from itertools import chain, product, combinations

import functools, operator

import multiprocessing, [Link]

import logging, warnings

import pickle, joblib # serialization

1.8 Iterators & Generators

# ■■ Iterator protocol ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

class Range:

def init(self, n): self.n = n; self.i = 0

def iter(self): return self

def __next__(self):

if self.i >= self.n: raise StopIteration

val = self.i; self.i += 1; return val

# ■■ Generator function (yield) ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

def data_loader(file_path, batch_size=32):

'''Yields mini-batches — memory efficient for large datasets'''

batch = []

with open(file_path) as f:

for line in f:

[Link]([Link]())

if len(batch) == batch_size:

yield batch

batch = []

if batch: yield batch # last partial batch

for batch in data_loader('[Link]'):

train_step(batch) # never loads full file into memory!

# ■■ Generator expression (lazy list comprehension) ■■■■■■■■■■■■■■■

gen = (x**2 for x in range(1_000_000)) # no memory used yet

next(gen) # compute on demand

# ■■ Send values INTO a generator ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

def accumulator():

total = 0

while True:

val = yield total

if val is None: break

total += val

acc = accumulator()

next(acc) # prime the generator

[Link](10) # 10

[Link](20) # 30

# ■■ itertools (memory-efficient combinatorics) ■■■■■■■■■■■■■■■■■■■■

from itertools import islice, chain, product, cycle

first_5 = list(islice(gen, 5)) # take first 5 from generator

all_data = chain(train, val, test) # chain iterables lazily

■ Generators are fundamental to PyTorch's DataLoader and TensorFlow's [Link] pipeline — they
process datasets larger than RAM by streaming batches on demand.

1.9 Functional Programming

from functools import reduce, partial, lru_cache

nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# map — apply function to every element (lazy, returns iterator)

squares = list(map(lambda x: x**2, nums)) # [1,4,9,16,...]

# filter — keep elements where function returns True

evens = list(filter(lambda x: x % 2 == 0, nums)) # [2,4,6,8,10]

# reduce — fold list into single value

product = reduce(lambda a, b: a * b, nums) # 3628800

# In practice, prefer comprehensions or NumPy over map/filter

squares_comp = [x**2 for x in nums] # more readable

# partial — fix some arguments (creates new function)

def power(base, exp): return base ** exp

square = partial(power, exp=2)

cube = partial(power, exp=3)

square(4) # 16

# lru_cache — memoize expensive function calls

@lru_cache(maxsize=None) # maxsize=None = unbounded

def expensive_feature(param):

# simulate slow feature extraction

return param 3 + param 2

# [Link] (already covered in decorators)

# zip, enumerate, any, all — built-in functional tools

any(x > 5 for x in nums) # True (short-circuits)

all(x > 0 for x in nums) # True

sum(x**2 for x in nums) # 385 (generator expression)

1.10 Memory Management & Performance

import sys, tracemalloc

# Memory sizes

print([Link]([])) # 56 bytes (empty list)

print([Link](range(1000))) # 48 bytes (range is lazy!)

# ■■ Python memory model ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

# Everything is an object. Integers -5 to 256 are cached (interned).

a = 256; b = 256; print(a is b) # True (same object)

a = 257; b = 257; print(a is b) # False (new objects)

# Reference counting + cyclic GC

import gc

[Link]() # force garbage collection

# ■■ slots — eliminates per-instance dict ■■■■■■■■■■■■■■■■■

class Point: # without slots: ~184 bytes per instance

slots = ['x', 'y'] # with slots: ~56 bytes

def init(self, x, y): self.x = x; self.y = y

# ■■ Profiling ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

# cProfile (function-level)

import cProfile

[Link]('train_loop()')

# timeit — micro-benchmarks

import timeit

t = [Link]('[x**2 for x in range(1000)]', number=1000)

# tracemalloc — memory tracking

[Link]()

# ... run code ...

snap = tracemalloc.take_snapshot()

for stat in [Link]('lineno')[:5]: print(stat)

# ■■ Performance tips ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

# 1. Use NumPy arrays instead of Python lists for numbers

# 2. Use generators for large data (don't load all at once)

# 3. Avoid global variables in hot loops (LOAD_GLOBAL is slow)

# 4. Use local variable aliases in tight loops

# 5. String concatenation: use ''.join(lst) not += in loop

# 6. Use set/dict for O(1) lookups instead of list O(n)

# 7. Use [Link] for O(1) popleft instead of [Link](0)

# 8. Use multiprocessing for CPU-bound, asyncio for I/O-bound tasks

1.11 Python in the AI/ML Ecosystem

Library/Framework Purpose Key Abstractions

NumPy Numerical computing ndarray, broadcasting, ufuncs

Pandas Data manipulation DataFrame, Series, GroupBy

Matplotlib/Seaborn Visualization Figure, Axes, FacetGrid

Scikit-learn Classical ML fit/transform/predict API

PyTorch Deep learning Tensor, Module, autograd

TensorFlow/Keras Deep learning Tensor, Layer, Model

Hugging Face NLP/Transformers Tokenizer, Trainer, Pipeline

OpenCV Computer vision Mat, imread, VideoCapture

NLTK/spaCy NLP preprocessing Token, Doc, Span

XGBoost/LightGBM Gradient boosting DMatrix, Booster

MLflow/W&B; Experiment tracking Run, Artifact, Metric

FastAPI ML model serving APIRouter, Pydantic model

Section 2 — NumPy: Basic to Advanced

2.1 ndarray Creation, Indexing & Slicing

NumPy's ndarray is the fundamental data structure for numerical computing in Python. All major ML
frameworks use NumPy-compatible array formats internally.

import numpy as np

# ■■ Creation ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

a = [Link]([1, 2, 3, 4, 5]) # 1D from list

b = [Link]([[1,2,3],[4,5,6]]) # 2D (shape 2,3)

c = [Link]([1.0, 2, 3], dtype=np.float32) # explicit dtype

[Link]((3,4)) # zeros matrix

[Link]((2,3,4)) # ones tensor (3D)

[Link](5) # identity matrix

[Link]((3,3), 7) # filled with 7

[Link](0, 10, 0.5) # like range but float-capable

[Link](0, 1, 100) # 100 evenly spaced points [0,1]

[Link](0, 3, 10) # 10 log-spaced points [1, 1000]

[Link]((2,2)) # uninitialized (fast allocation)

# ■■ Array attributes ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

a = [Link]([[1,2,3],[4,5,6]])

[Link] # (2, 3)

[Link] # 2

[Link] # int64

[Link] # 6 (total elements)

[Link] # 48 (memory in bytes)

[Link] # 8 (bytes per element)

# ■■ Indexing & Slicing ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

a[0] # first row: [1,2,3]

a[1, 2] # row 1, col 2: 6

a[:, 1] # all rows, col 1: [2,5]

a[0, :] # row 0 (same as a[0])

a[0:2, 1:3] # sub-matrix: [[2,3],[5,6]]

a[::1, ::-1] # reverse columns

# ■■ Reshaping ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

[Link](3, 2) # new shape (same data, view)

[Link](-1, 1) # column vector (rows inferred)

[Link]() # 1D copy

[Link]() # 1D view (faster than flatten)

a.T # transpose (view, not copy)

np.expand_dims(a, axis=0) # add dimension: (1,2,3)

[Link](a) # remove size-1 dimensions

a[[Link], :] # same as expand_dims

2.2 Broadcasting & Vectorization

Broadcasting is NumPy's mechanism for performing element-wise operations on arrays of different shapes
without copying data. It is the key to efficient ML computations.

# ■■ Broadcasting rules ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

# 1. If arrays differ in ndim, prepend 1s to smaller shape

# 2. Arrays with size 1 along a dim are stretched to match

# 3. Shapes must match or one must be 1

a = [Link]([[1],[2],[3]]) # shape (3,1)

b = [Link]([10, 20, 30]) # shape (3,)

# b treated as (1,3) → broadcast to (3,3)

a + b

# [[11,21,31],

# [12,22,32],

# [13,23,33]]

# ■■ Common ML broadcasting patterns ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

# Batch normalization: (batch, features) - (1, features)

X = [Link](1000, 128) # 1000 samples, 128 features

mean = [Link](axis=0) # (128,)

std = [Link](axis=0) # (128,)

X_norm = (X - mean) / std # broadcasts (1000,128) - (128,)

# Bias addition in neural net layers

W = [Link](128, 64) # weight matrix

b = [Link](64) # bias vector (64,)

Z = X @ W + b # (1000,64) + (64,) → broadcast (1000,64)

# ■■ Vectorization vs Python loop ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

# SLOW: pure Python

result = [x**2 + 2*x for x in range(1000000)]

# FAST: NumPy vectorized (typically 50-200x faster)

x = [Link](1000000)

result = x**2 + 2*x

# Universal functions (ufuncs) — applied element-wise in C

[Link](x), [Link](x), [Link](x), [Link](x)

[Link](x), [Link](x), [Link](x)

[Link](x, 0) # ReLU!

[Link](x, 0, 1) # Sigmoid-output clipping

■ [Link](x, 0) IS ReLU activation. [Link](logits, -500, 500) prevents overflow in softmax. Vectorize
everything that runs in a training loop.

2.3 Mathematical & Linear Algebra Operations

# ■■ Aggregate operations ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

a = [Link]([[1,2,3],[4,5,6]])

[Link](a) # 21 (all elements)

[Link](a, axis=0) # [5,7,9] (sum per column)

[Link](a, axis=1) # [6,15] (sum per row)

[Link](a, axis=0)

[Link](a), [Link](a)

[Link](a, axis=1) # index of min in each row (useful for predictions)

[Link](a, axis=1) # predicted class in classification!

[Link](a) # cumulative sum

# ■■ Linear algebra ([Link]) ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

A = [Link](4, 4)

[Link](A, A.T) # matrix multiplication (A·A■)

A @ A.T # same, cleaner syntax (Python 3.5+)

[Link](A) # determinant

[Link](A) # inverse

[Link](A) # Frobenius norm

[Link](A, axis=1) # row-wise L2 norm

# Eigendecomposition (PCA, graph algorithms)

eigenvalues, eigenvectors = [Link](A @ A.T)

# SVD (dimensionality reduction, matrix factorization)

U, s, Vt = [Link](A, full_matrices=False)

# Solve linear system Ax = b

b = [Link](4)

x = [Link](A, b)

# QR decomposition

Q, R = [Link](A)

# Least squares (linear regression!)

# min ||Xw - y||^2

X = [Link](100, 4)

y = [Link](100)
w, residuals, rank, sv = [Link](X, y, rcond=None)

2.4 Random Module

rng = [Link].default_rng(seed=42) # recommended modern API

[Link]((3,4)) # uniform [0,1)

[Link](0, 10, (3,4)) # random ints

[Link](0, 1, (100,10)) # Gaussian — weight init

[Link](-1, 1, 1000) # uniform range

[Link]([1,2,3,4,5], size=3, replace=False) # sampling

[Link](arr) # in-place shuffle

[Link](100) # shuffled indices — train/val split

# Legacy API (still widely seen)

[Link](42)

[Link](3, 3) # standard normal

[Link](0, 100, (5,))

# Distributions used in ML

rng.standard_normal((batch, features)) # Xavier init approximation

[Link](n=1, p=0.5, size=1000) # Bernoulli — coin flip

[Link](n=1, pvals=[.2,.3,.5]) # categorical sampling

2.5 Advanced Indexing & Masking

a = [Link](10) # [0,1,2,3,4,5,6,7,8,9]

# ■■ Boolean (mask) indexing ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

mask = a > 5

a[mask] # [6,7,8,9] — returns copy

a[a % 2 == 0] # [0,2,4,6,8]

# Set values using mask

a[a < 0] = 0 # ReLU in one line!

# ■■ Fancy indexing ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

idx = [1, 3, 7]

a[idx] # [1,3,7] — returns copy

# 2D fancy indexing

mat = [Link](25).reshape(5,5)

rows = [0, 2, 4]

cols = [1, 3, 0]

mat[rows, cols] # [1, 13, 20] — element (0,1),(2,3),(4,0)

# [Link] — conditional selection / replacement

x = [Link]([-2, -1, 0, 1, 2])

[Link](x > 0, x, 0) # [0,0,0,1,2] — ReLU again!

[Link](x > 0, 1, -1) # sign function

# [Link] — indices of nonzero elements

[Link](x) # (array([3,4]),) — indices of 1 and 2

# [Link] & [Link]

[Link](a, [2,4,6]) # a[[2,4,6]]

2.6 Memory Efficiency & Performance

# ■■ Views vs Copies ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

a = [Link]([1, 2, 3, 4, 5])

b = a[1:4] # VIEW — shares memory

b[0] = 99 # a is also modified!

c = a[1:4].copy() # COPY — independent

# Check if it's a view

[Link] is a # True

# ■■ dtype selection — critical for memory in DL ■■■■■■■■■■■■■■■■■■

# float64: 8 bytes (default Python float)

# float32: 4 bytes (standard in deep learning)

# float16: 2 bytes (mixed precision training)

# int32: 4 bytes

# int8: 1 byte (quantized models)

X = [Link](1000, 1000).astype(np.float32)

print([Link]) # 4MB vs 8MB for float64

# ■■ Contiguous arrays — for C/Fortran interop ■■■■■■■■■■■■■■■■■■■

[Link]['C_CONTIGUOUS'] # row-major (C order)

[Link]['F_CONTIGUOUS'] # column-major (Fortran order)

[Link](a) # ensure C-contiguous

# ■■ [Link] — readable and efficient tensor ops ■■■■■■■■■■■■■■■■

# Matrix multiplication

[Link]('ij,jk->ik', A, B) # same as A @ B

# Batch matrix multiply

[Link]('bij,bjk->bik', A_batch, B_batch)

# Trace

[Link]('ii->', A)

# Outer product

[Link]('i,j->ij', a, b)

Operation Python List NumPy Array Speedup

Sum of 1M ints ~60ms ~1ms ~60x

Element-wise multiply ~100ms ~2ms ~50x

Memory (1M floats) ~28 MB ~8 MB (float64) 3.5x

Matrix multiply 1000x1000 Manual loops ~5ms (BLAS) 1000x+

Boolean indexing List comp Native (C) ~30x

2.7 NumPy's Role in ML

ML Operation NumPy Implementation Notes

Linear regression forward y_pred = X @ w + b Matrix-vector product

MSE loss [Link]((y_pred - y)**2) Vectorized over batch

Gradient descent step w -= lr * grad Broadcasting update

[Link](z)/[Link](z).sum(axis=1,keepd Numerically stable with max

Softmax
ims=True) subtraction

ReLU activation [Link](0, z) Element-wise ufunc

One-hot encoding [Link](n_classes)[labels] Fancy indexing trick

Train/test split idx = [Link](n) Shuffle then slice

Normalization (X - [Link](0)) / [Link](0) Broadcasting over axis=0

Cosine similarity X @ Y.T / (norm(X)*norm(Y)) Linear algebra

PCA (manual) U,s,Vt = [Link](X_centered) SVD decomposition

Section 3 — Pandas: Basic to Advanced

3.1 Series & DataFrame

import pandas as pd

import numpy as np

# ■■ Series — 1D labelled array ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

s = [Link]([10, 20, 30, 40], index=['a','b','c','d'])

s['b'] # 20 — label-based

s[1] # 20 — position-based

s[['a','c']] # [10,30]

s[s > 15] # filter: [20,30,40]

# ■■ DataFrame — 2D table ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

df = [Link]({

'age': [25, 30, 35, 28],

'salary': [50000, 80000, 120000, 60000],

'dept': ['Eng','Mkt','Eng','HR'],

'score': [0.8, 0.9, 0.95, 0.7]

})

# ■■ Inspection ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

[Link] # (4, 4)

[Link] # column data types

[Link]() # dtype + non-null counts + memory

[Link]() # statistics (count, mean, std, min, 25%, 50%, 75%, max)

[Link](2) # first 2 rows

[Link](2) # last 2 rows

[Link] # Index(['age','salary','dept','score'])

[Link] # RangeIndex(start=0, stop=4)

[Link] # NumPy array underlying

[Link]() # unique count per column

df.value_counts('dept') # frequency of each dept

3.2 Data Loading

# CSV

df = pd.read_csv('[Link]')

df = pd.read_csv('[Link]',

sep=',',

header=0,

index_col='id',

usecols=['age','salary','target'],

dtype={'age': np.int32},

parse_dates=['date'],

na_values=['NA','N/A','?'],

chunksize=10000) # iterator for large files

# Excel

df = pd.read_excel('[Link]', sheet_name='Sheet1')

# JSON

df = pd.read_json('[Link]')

df = pd.read_json('[Link]', orient='records')

# SQL

import sqlalchemy

engine = sqlalchemy.create_engine('sqlite:///[Link]')

df = pd.read_sql('SELECT * FROM users WHERE active=1', engine)

# Parquet (recommended for large datasets — columnar format)

df = pd.read_parquet('[Link]')

df.to_parquet('[Link]', compression='snappy')

# Save
df.to_csv('[Link]', index=False)

df.to_excel('[Link]', index=False)

3.3 Data Cleaning & Missing Values

# ■■ Detect missing values ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

[Link]().sum() # count NaN per column

[Link]().mean() * 100 # % missing per column

df[[Link]().any(axis=1)] # rows with any NaN

# ■■ Drop ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

[Link]() # drop rows with any NaN

[Link](subset=['age','salary']) # only check these cols

[Link](thresh=3) # keep rows with >= 3 non-NaN

[Link](axis=1) # drop columns

# ■■ Fill ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

[Link](0) # fill all NaN with 0

df['age'].fillna(df['age'].mean()) # mean imputation

df['dept'].fillna('Unknown') # categorical fill

[Link](method='ffill') # forward fill

[Link](method='bfill') # backward fill

# ■■ Advanced imputation (ML-based) ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

from [Link] import SimpleImputer, KNNImputer

imp = KNNImputer(n_neighbors=5)

df_imputed = [Link](imp.fit_transform(df), columns=[Link])

# ■■ Duplicates ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

[Link]().sum() # count duplicates

df.drop_duplicates(inplace=True) # remove

df.drop_duplicates(subset=['age','dept']) # based on subset

# ■■ Data type issues ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

df['salary'] = pd.to_numeric(df['salary'], errors='coerce') # str→num

df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')

df['dept'] = df['dept'].astype('category') # save memory

df['age'] = df['age'].astype(np.int32)

# ■■ Outlier detection ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

Q1 = df['salary'].quantile(0.25)

Q3 = df['salary'].quantile(0.75)

IQR = Q3 - Q1

lower, upper = Q1 - 1.5IQR, Q3 + 1.5IQR

outliers = df[(df['salary'] < lower) | (df['salary'] > upper)]

df_clean = df[df['salary'].between(lower, upper)]

# Z-score method

from scipy import stats

z_scores = [Link]([Link](df['salary']))

df_clean = df[z_scores < 3]

3.4 Filtering, Selection & GroupBy

# ■■ Selection ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

df['age'] # single column → Series

df[['age','salary']] # multi-column → DataFrame

# loc — label-based (inclusive on both ends)

[Link][0, 'age'] # row 0, col 'age'

[Link][0:2, ['age','dept']] # rows 0-2 (inclusive), 2 cols

[Link][df['age'] > 28] # boolean filter with loc

# iloc — position-based (exclusive end, like Python slicing)

[Link][0, 1] # row 0, col index 1

[Link][1:3, :] # rows 1,2 all cols

[Link][:, -1] # last column

# ■■ Boolean filtering ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

df[df['age'] > 28]

df[(df['age'] > 25) & (df['salary'] > 60000)] # AND

df[(df['dept'] == 'Eng') | (df['dept'] == 'HR')] # OR

df[~(df['dept'] == 'HR')] # NOT

df[df['dept'].isin(['Eng','Mkt'])]

df[df['dept'].[Link]('E')] # string method

# query() — SQL-like string syntax (readable)

[Link]('age > 25 and salary > 60000')

[Link]('dept in ["Eng", "HR"]')

# ■■ GroupBy ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

# Split → Apply → Combine

grp = [Link]('dept')

grp['salary'].mean() # mean salary per dept

grp['salary'].agg(['mean','std','min','max'])

[Link]() # count per group

# Multiple aggregations (common in feature engineering)

result = [Link]('dept').agg(

avg_salary = ('salary', 'mean'),

max_score = ('score', 'max'),

count = ('age', 'count')

).reset_index()

# transform — returns same shape as input (useful for features)

df['salary_rank_in_dept'] = [Link]('dept')['salary'].rank()

df['dept_mean_salary'] = [Link]('dept')['salary'].transform('mean')

# filter groups

big_depts = [Link]('dept').filter(lambda g: len(g) >= 2)

3.5 Merge, Join & Concat
left = [Link]({'id':[1,2,3], 'name':['A','B','C']})

right = [Link]({'id':[2,3,4], 'score':[90,85,78]})

# ■■ merge (like SQL JOIN) ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

[Link](left, right, on='id', how='inner') # 2 rows

[Link](left, right, on='id', how='left') # 3 rows (keep all left)

[Link](left, right, on='id', how='right') # 3 rows (keep all right)

[Link](left, right, on='id', how='outer') # 4 rows (union)

# merge on different column names

[Link](left, right, left_on='id', right_on='emp_id')

# ■■ join (index-based) ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

left.set_index('id').join(right.set_index('id'), how='left')

# ■■ concat (stack DataFrames) ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

[Link]([df1, df2], axis=0, ignore_index=True) # vertical stack

[Link]([df1, df2], axis=1) # horizontal stack

[Link]([df1, df2], keys=['train','val']) # multi-index

Operation Use Case SQL Equivalent Key Param

inner merge Records in both tables INNER JOIN how='inner'

left merge Keep all left, match right LEFT JOIN how='left'

outer merge Keep all records FULL OUTER JOIN how='outer'

ignore_index=Tr
concat axis=0 Stack new rows UNION ALL
ue

Add columns
concat axis=1 Lateral join axis=1
side-by-side

join Index-based merge JOIN on index on parameter

3.6 Time Series & Feature Engineering

# ■■ Time series ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date').sort_index()

# Resampling (change frequency)

[Link]('D').mean() # daily mean

[Link]('W').sum() # weekly sum

[Link]('ME').last() # month-end last value

# Rolling window (moving average, volatility)

df['ma_7'] = df['close'].rolling(7).mean()

df['std_30'] = df['close'].rolling(30).std()

df['ema_12'] = df['close'].ewm(span=12).mean()

# Datetime components

df['year'] = [Link]

df['month'] = [Link]

df['weekday'] = [Link]

df['is_weekend'] = [Link] >= 5

# ■■ Feature Engineering ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

# Binning continuous → categorical

df['age_group'] = [Link](df['age'], bins=[0,25,35,100],

labels=['young','mid','senior'])

# qcut — equal-frequency bins

df['salary_quartile'] = [Link](df['salary'], q=4,

labels=['Q1','Q2','Q3','Q4'])

# Label encoding

df['dept_code'] = df['dept'].astype('category').[Link]

# One-hot encoding

df_ohe = pd.get_dummies(df, columns=['dept'], drop_first=True)

# Lag features (important for time series ML)

df['lag_1'] = df['value'].shift(1)

df['lag_7'] = df['value'].shift(7)

df['diff_1'] = df['value'].diff(1)

# Target encoding (mean of target per category)

means = [Link]('dept')['salary'].transform('mean')

df['dept_salary_mean'] = means

3.7 Performance Tips & Role in ML Pipelines

# ■■ Memory reduction ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

def reduce_memory(df):

for col in [Link]:

if df[col].dtype == 'float64':

df[col] = df[col].astype('float32')

elif df[col].dtype == 'int64':

mx = df[col].max()

if mx < 127: df[col] = df[col].astype('int8')

elif mx < 32767: df[col] = df[col].astype('int16')

elif mx < 2147483647: df[col] = df[col].astype('int32')

elif df[col].dtype == 'object' and df[col].nunique() / len(df) < 0.5:

df[col] = df[col].astype('category')

return df

# ■■ Fast operations ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

# Use vectorized string methods (not apply+lambda)

df['name'].[Link]() # fast

df['name'].[Link]('AI') # fast

# Avoid iterrows — very slow

# BAD: for idx, row in [Link](): ...

# GOOD: df['new_col'] = df['col1'] + df['col2'] (vectorized)

# GOOD: [Link](func, axis=1) (still slow but better than iterrows)

# BEST: Use numpy operations on .values

# eval() and query() use numexpr — faster on large frames

[Link]('new_col = salary * score + age * 100')

# ■■ pandas in ML pipeline ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

from [Link] import Pipeline

from [Link] import StandardScaler

from sklearn.linear_model import LogisticRegression

# Typical ML workflow

# 1. Load with pd.read_csv / read_parquet

# 2. Clean: dropna, fillna, fix dtypes

# 3. Feature engineer: new columns, encoding

# 4. Split: X = df[features], y = df[target]

# 5. Convert: [Link] or X.to_numpy() → pass to sklearn

X = df[['age','salary','score']].to_numpy()

y = df['dept_code'].to_numpy()

pipe = Pipeline([('scaler', StandardScaler()),

('clf', LogisticRegression())])

[Link](X_train, y_train)
Section 4 — Matplotlib: Basic to Advanced

4.1 Core Plot Types

import [Link] as plt

import numpy as np

# ■■ Object-oriented API (preferred for complex plots) ■■■■■■■■■■■■

fig, ax = [Link](figsize=(8, 4))

# ■■ Line plot ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

x = [Link](0, 10, 200)

[Link](x, [Link](x), color='royalblue', lw=2,

linestyle='--', label='sin(x)')

[Link](x, [Link](x), color='tomato', lw=2, label='cos(x)')

ax.set_xlabel('x'); ax.set_ylabel('y')

ax.set_title('Trigonometric Functions')

[Link](loc='upper right')

[Link](True, alpha=0.3)

plt.tight_layout()

[Link]('[Link]', dpi=150, bbox_inches='tight')

# ■■ Scatter plot ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

fig, ax = [Link]()

[Link](X[:,0], X[:,1], c=y, cmap='viridis',

s=30, alpha=0.7, edgecolors='none')

[Link]([Link][0], ax=ax, label='Class')

# ■■ Bar chart ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

categories = ['Accuracy','Precision','Recall','F1']

values = [0.92, 0.89, 0.94, 0.91]

colors = ['#2196F3','#4CAF50','#FF9800','#E91E63']
fig, ax = [Link]()

bars = [Link](categories, values, color=colors, width=0.5, edgecolor='white')

ax.bar_label(bars, fmt='%.2f', padding=3) # value on top of bar

ax.set_ylim(0, 1.1)

# ■■ Horizontal bar ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

[Link](categories, values, color=colors)

# ■■ Histogram ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

fig, ax = [Link]()

[Link](data, bins=30, color='steelblue', edgecolor='white',

alpha=0.8, density=True) # density=True for PDF

ax.set_xlabel('Value'); ax.set_ylabel('Density')

4.2 Customization & Subplots

# ■■ Subplots — the Swiss Army knife ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

fig, axes = [Link](2, 3, figsize=(15, 8))

[Link]('Model Analysis Dashboard', fontsize=16, fontweight='bold')

# Access subplots

ax = axes[0, 0] # row 0, col 0

ax = axes[1, 2] # row 1, col 2

# Flatten for easy iteration

for ax in [Link]():

ax.set_visible(False) # hide unused

# Shared axes

fig, (ax1, ax2) = [Link](1, 2, sharey=True)

# gridspec — unequal subplot sizes

import [Link] as gridspec

gs = [Link](2, 3, hspace=0.4, wspace=0.3)

ax_big = fig.add_subplot(gs[:, 0]) # spans both rows

ax_tr = fig.add_subplot(gs[0, 1])

ax_br = fig.add_subplot(gs[1, 1])

# ■■ Styling ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

[Link]('seaborn-v0_8-whitegrid') # built-in theme

[Link]('ggplot')

[Link]({'[Link]': 12, '[Link]': 120})

# ■■ Annotations ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

[Link]('Peak',

xy=(peak_x, peak_y),

xytext=(peak_x+0.5, peak_y+0.1),

arrowprops=dict(arrowstyle='->', color='red'),

fontsize=10, color='red')

[Link](y=threshold, color='red', linestyle='--', label='Threshold')

[Link](x=split_point, color='green', linestyle=':')

ax.fill_between(x, y_lower, y_upper, alpha=0.2, label='CI')

# ■■ Twin axes (two y scales) ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

ax2 = [Link]()

[Link](x, loss, 'b-', label='Loss')

[Link](x, accuracy, 'r-', label='Accuracy')

ax.set_ylabel('Loss', color='blue')

ax2.set_ylabel('Accuracy', color='red')

4.3 Advanced ML Visualizations

import [Link] as plt

import numpy as np

# ■■ Confusion Matrix ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

from [Link] import confusion_matrix

import seaborn as sns

cm = confusion_matrix(y_true, y_pred)

fig, ax = [Link](figsize=(6,5))

[Link](cm, annot=True, fmt='d', cmap='Blues',

xticklabels=class_names, yticklabels=class_names, ax=ax)

ax.set_xlabel('Predicted'); ax.set_ylabel('True')

ax.set_title('Confusion Matrix')

# ■■ ROC Curve ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

from [Link] import roc_curve, auc

fpr, tpr, _ = roc_curve(y_true, y_prob)

roc_auc = auc(fpr, tpr)

[Link](fpr, tpr, lw=2, label=f'AUC = {roc_auc:.3f}')

[Link]([0,1],[0,1],'k--', label='Random') # chance line

ax.set_xlabel('False Positive Rate')

ax.set_ylabel('True Positive Rate')

# ■■ Loss / Accuracy curves ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

fig, (ax1, ax2) = [Link](1, 2, figsize=(12, 4))

[Link](train_loss, label='Train Loss')

[Link](val_loss, label='Val Loss', linestyle='--')

[Link](train_acc, label='Train Acc')

[Link](val_acc, label='Val Acc', linestyle='--')

for ax in (ax1, ax2):

[Link](); [Link](True, alpha=0.3)

# ■■ Feature importance ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

importances = model.feature_importances_

sorted_idx = [Link](importances)[::-1][:15]

[Link]([feature_names[i] for i in sorted_idx[::-1]],

importances[sorted_idx[::-1]])

ax.set_title('Top 15 Feature Importances')

# ■■ Correlation heatmap ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

corr = [Link]()

mask = [Link](np.ones_like(corr, dtype=bool))

[Link](corr, mask=mask, annot=True, fmt='.2f',

cmap='coolwarm', center=0, ax=ax)

# ■■ Learning curve ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

from sklearn.model_selection import learning_curve

sizes, train_sc, val_sc = learning_curve(

model, X, y, cv=5, train_sizes=[Link](0.1,1,10))

ax.fill_between(sizes, train_sc.mean(1)-train_sc.std(1),

train_sc.mean(1)+train_sc.std(1), alpha=0.1)

[Link](sizes, train_sc.mean(1), 'o-', label='Train')

[Link](sizes, val_sc.mean(1), 'o-', label='Val')

4.4 Saving, Exporting & Best Practices

# ■■ Saving figures ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

[Link]('[Link]', dpi=300, bbox_inches='tight')

[Link]('[Link]', bbox_inches='tight') # vector format

[Link]('[Link]', bbox_inches='tight') # editable in Inkscape

[Link]('[Link]', bbox_inches='tight') # LaTeX / journals

# ■■ In-memory save (for APIs / web) ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

from io import BytesIO

buf = BytesIO()

[Link](buf, format='png', dpi=150, bbox_inches='tight')

[Link](0)

img_bytes = [Link]() # send over HTTP

# ■■ Closing figures to free memory ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

[Link](fig) # close specific figure

[Link]('all') # close all figures

# ■■ Best practices ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

# 1. Always use fig, ax = [Link]() over [Link]()

# 2. Set DPI >= 150 for reports, >= 300 for publications

# 3. Use tight_layout() or constrained_layout=True

# 4. Label everything: title, xlabel, ylabel, legend

# 5. Use colorblind-friendly palettes (viridis, plasma, tab10)

# 6. Avoid 3D plots unless truly necessary (they distort perception)

# 7. Close figures after saving in loops to prevent memory leak

# 8. For interactive plots: use plotly or bokeh instead

Section 5 — End-to-End Mini Project
We will walk through a complete ML workflow: loading the Titanic dataset, cleaning it, engineering features,
building a classifier, and visualizing results. Every step uses Python, NumPy, Pandas, and Matplotlib
together.

Step 1 — Data Loading & Inspection

import pandas as pd, numpy as np

import [Link] as plt

from pathlib import Path

# Load (use Titanic or any CSV)

df = pd.read_csv('[Link]')

print([Link]) # (891, 12)

print([Link])

print([Link]().sum())

# Age: 177 missing, Cabin: 687 missing, Embarked: 2 missing

print([Link]())

Step 2 — Data Cleaning

# Drop high-missingness column

[Link]('Cabin', axis=1, inplace=True)

# Fill missing Age with median grouped by class

df['Age'] = [Link]('Pclass')['Age'].transform(

lambda x: [Link]([Link]()))

# Fill missing Embarked with mode

df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True)

# Drop irrelevant columns

[Link](['PassengerId','Name','Ticket'], axis=1, inplace=True)

print([Link]().sum()) # all zeros now

Step 3 — Feature Engineering
# Family size

df['FamilySize'] = df['SibSp'] + df['Parch'] + 1

df['IsAlone'] = (df['FamilySize'] == 1).astype(int)

# Title from Name (if we had kept it)

# df['Title'] = df['Name'].[Link](r' ([A-Za-z]+)\.', expand=False)

# Age bins

df['AgeBin'] = [Link](df['Age'], bins=[0,12,18,35,60,100],

labels=[0,1,2,3,4]).astype(int)

# Fare log transform (reduce skew)

df['LogFare'] = np.log1p(df['Fare'])

# Encode categorical

df['Sex'] = (df['Sex'] == 'male').astype(int)

df = pd.get_dummies(df, columns=['Embarked'], drop_first=True)

print([Link]())

print([Link]) # (891, 15)

Step 4 — Train a Model

from sklearn.model_selection import train_test_split, cross_val_score

from [Link] import StandardScaler

from [Link] import RandomForestClassifier

from [Link] import accuracy_score, classification_report

X = [Link]('Survived', axis=1).to_numpy(dtype=np.float32)

y = df['Survived'].to_numpy()

X_train, X_test, y_train, y_test = train_test_split(

X, y, test_size=0.2, random_state=42, stratify=y)

scaler = StandardScaler()
X_train_sc = scaler.fit_transform(X_train)

X_test_sc = [Link](X_test)

clf = RandomForestClassifier(n_estimators=100, random_state=42)

[Link](X_train_sc, y_train)

y_pred = [Link](X_test_sc)

print('Accuracy:', accuracy_score(y_test, y_pred))

print(classification_report(y_test, y_pred))

# Cross-validation

cv_scores = cross_val_score(clf, X_train_sc, y_train, cv=5)

print(f'CV Accuracy: {cv_scores.mean():.3f} +/- {cv_scores.std():.3f}')

Step 5 — Visualisation & Insights

feature_names = [Link]('Survived',axis=1).columns

importances = clf.feature_importances_

sorted_idx = [Link](importances)

fig, axes = [Link](1, 3, figsize=(18, 5))

[Link]('Titanic ML Analysis', fontsize=15, fontweight='bold')

# ■■ Panel 1: Feature importance ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

axes[0].barh([feature_names[i] for i in sorted_idx],

importances[sorted_idx], color='steelblue')

axes[0].set_title('Feature Importances')

# ■■ Panel 2: Survival by class & sex ■■■■■■■■■■■■■■■■■■■■■■■■■■■■

pivot = [Link](['Pclass','Sex'])['Survived'].mean().unstack()

[Link](kind='bar', ax=axes[1], color=['#EF5350','#42A5F5'])

axes[1].set_title('Survival Rate by Class & Sex')

axes[1].set_ylabel('Survival Rate')

axes[1].legend(['Female (0)','Male (1)'])

# ■■ Panel 3: Age distribution by survival ■■■■■■■■■■■■■■■■■■■■■■■

for survived, grp in [Link]('Survived')['Age']:

lbl = 'Survived' if survived else 'Did not survive'

axes[2].hist(grp, bins=25, alpha=0.6, label=lbl, density=True)

axes[2].set_title('Age Distribution by Survival')

axes[2].set_xlabel('Age'); axes[2].legend()

plt.tight_layout()

[Link]('titanic_analysis.png', dpi=150, bbox_inches='tight')

[Link]()
Section 6 — Reference Tables & Interview
Guide

6.1 Time & Space Complexity Cheat Sheet

Structure / Op Access Search Insert Delete Notes

list (end) O(1) O(n) O(1)* O(n) *amortized

list (front) O(1) O(n) O(n) O(n) Shifts all elements

dict O(1)* O(1)* O(1)* O(1)* *average; worst O(n)

set — O(1)* O(1)* O(1)* Hash table

tuple O(1) O(n) — — Immutable

deque O(n) O(n) O(1) O(1) Both ends O(1)

heapq O(1) min O(n) O(log n) O(log n) Priority queue

[Link] O(1) O(n) O(n) O(n) Contiguous memory

[Link] O(1) col O(n) O(n) O(n) Column-oriented

6.2 Common Pitfalls

■ Mutable default argument

■■ Problem: def f(lst=[]): [Link](1) — list is shared across calls

Fix: Use def f(lst=None): if lst is None: lst=[]

■ == None vs is None

■■ Problem: NumPy arrays will raise 'ambiguous truth value' with ==

Fix: Always use 'is None' or 'is not None'

■ Modifying list while iterating

■■ Problem: for x in lst: [Link](x) — skips elements

Fix: Iterate over a copy: for x in [Link]()

■ Integer caching surprise

■■ Problem: a=1000; b=1000; a is b → False (no caching above 256)

Fix: Use == for value comparison, 'is' only for None/True/False
■ Broadcasting shape mismatch

■■ Problem: [Link]([1,2,3]) + [Link]([[1],[2]]) → (3,3) not error!

Fix: Explicitly reshape arrays; always check shapes first

■ pandas chained indexing

■■ Problem: df['col'][mask] = val — may not modify original

Fix: Use [Link][mask, 'col'] = val

■ pandas SettingWithCopy

■■ Problem: df2 = df[[Link]>25]; df2['x'] = 1 — SettingWithCopyWarning

Fix: Use [Link]() or loc-based assignment

■ float precision

■■ Problem: 0.1 + 0.2 == 0.3 → False in Python

Fix: Use [Link](a, b, rel_tol=1e-9) for float comparison

■ Generator exhaustion

■■ Problem: gen = (x for x in range(5)); sum(gen); sum(gen) → 0

Fix: Generators can only be consumed once; recreate or use list()

6.3 Top Interview Questions & Answers

■ Q: What is GIL and how does it affect ML code?
The Global Interpreter Lock (GIL) prevents multiple native threads from executing Python bytecode
simultaneously. For CPU-bound ML code, use multiprocessing (bypasses GIL) or NumPy (releases GIL in
C extensions). I/O-bound code uses asyncio/threading effectively.

■ Q: Difference between deep copy and shallow copy?

Shallow copy ([Link]) copies the object but not nested objects — nested mutable objects are shared.
Deep copy ([Link]) recursively copies everything. For ML: [Link]() is a deep copy of the
DataFrame; a NumPy slice is a VIEW (shallow); .copy() on an array makes an independent copy.

■ Q: When would you use a generator over a list?

When data is too large to fit in memory (e.g., streaming millions of rows), when you only need one pass, or
when you want to compose lazy pipelines. PyTorch DataLoader and [Link] use generators internally.

■ Q: What is broadcasting and why does it matter in ML?

Broadcasting is NumPy's rule for performing element-wise operations on arrays with different but
compatible shapes without copying data. It makes batch operations (normalize features, add bias vectors)
efficient and concise — the backbone of forward/backward pass implementations.
■ Q: Explain pandas .loc vs .iloc vs boolean indexing
.loc uses labels (inclusive end), .iloc uses integer positions (exclusive end, like Python slicing), boolean
indexing filters rows by a True/False mask. For ML: use .loc for feature selection by name, .iloc for
positional splits, boolean for data cleaning filters.

■ Q: What are decorators and where are they used in ML frameworks?

Decorators are functions that wrap other functions to add behaviour (timing, logging, caching, validation). In
ML: @[Link] JIT-compiles Python functions to TensorFlow graph ops; @torch.no_grad() disables
gradient tracking during inference; @property exposes computed attributes like model.num_parameters.

■ Q: How does Python manage memory? What is reference counting?

Python uses reference counting as its primary mechanism — each object tracks how many references point
to it. When the count drops to zero, the object is deallocated. A cyclic garbage collector handles reference
cycles. Large NumPy arrays should be explicitly deleted (del arr) and [Link]() called when memory is
tight.

■ Q: Difference between map(), filter() and list comprehension?

All three transform sequences. List comprehensions are more Pythonic, readable, and slightly faster
because they avoid function call overhead per element. map() and filter() return lazy iterators. For ML use:
[Link] or NumPy ufuncs are preferred over all three for numerical data.

■ Q: What is vectorization and why is it critical in ML?

Vectorization replaces explicit Python loops with C-level array operations (NumPy ufuncs, matrix multiply).
A Python loop over 1M elements may take 100ms; the NumPy equivalent takes ~1ms. In deep learning, the
entire forward pass is a sequence of matrix multiplications — essentially vectorized operations.

■ Q: How would you detect and handle data leakage in a pandas pipeline?
Data leakage occurs when test-set information is used during training (e.g., fitting a scaler on all data). Fix:
always fit preprocessors (StandardScaler, imputers, encoders) only on training data, then transform both.
Use sklearn Pipeline to enforce this. Check for temporal leakage in time series by using time-based splits.
Quick-Reference: Python Ecosystem
Interconnections
Concept Python NumPy Pandas Matplotlib

Data container list, dict ndarray DataFrame/Series —

Iteration for, generators [Link], vectorize iterrows (avoid) —

Functional ops map,filter,reduce ufuncs,where apply,map,transform —

Math math module [Link], [Link] describe(), corr() —

I/O open(), pathlib [Link]/load .npy read_csv/to_csv savefig

Missing data None [Link] NaN, isnull() —

Type system dynamic dtype per array dtype per column —

Memory model ref count + GC contiguous buffer column store —

Performance tip slots, deque vectorize, einsum eval(), query() close(fig)

Tensor math
ML integration Base language Data pipeline Result visualization
backbone

Python & AI/ML Comprehensive Study Guide • All sections complete • Good luck! ■

Assignmemt 1 To 5 PDF
No ratings yet
Assignmemt 1 To 5 PDF
30 pages
Python Backend Interview Guide
No ratings yet
Python Backend Interview Guide
25 pages
ML Study Guide Comprehensive
No ratings yet
ML Study Guide Comprehensive
39 pages
Python Using AI (ML Pre-Requisite)
No ratings yet
Python Using AI (ML Pre-Requisite)
5 pages
Python Programming Basics Guide
No ratings yet
Python Programming Basics Guide
16 pages
Python Fundamentals
No ratings yet
Python Fundamentals
40 pages
Advanced Python Techniques by Hettinger
No ratings yet
Advanced Python Techniques by Hettinger
44 pages
Python Basics Training Guide
No ratings yet
Python Basics Training Guide
54 pages
Python Functions and Closures Explained
No ratings yet
Python Functions and Closures Explained
32 pages
Python Guide
No ratings yet
Python Guide
7 pages
AI Engineer Handbook
No ratings yet
AI Engineer Handbook
22 pages
AI Fundamentals and Python Basics
No ratings yet
AI Fundamentals and Python Basics
140 pages
Understanding Python Function Scopes
No ratings yet
Understanding Python Function Scopes
33 pages
Overview of AI Types and Python Basics
No ratings yet
Overview of AI Types and Python Basics
8 pages
Python Basics for Generative AI
No ratings yet
Python Basics for Generative AI
58 pages
Python Basics for CS41 Students
No ratings yet
Python Basics for CS41 Students
50 pages
FAANG Python Interview Prep Guide
No ratings yet
FAANG Python Interview Prep Guide
15 pages
Python Programming Overview by D. Vrajitoru
No ratings yet
Python Programming Overview by D. Vrajitoru
16 pages
Python Programming Overview and Features
No ratings yet
Python Programming Overview and Features
16 pages
Essential Python Cheat Sheet Guide
No ratings yet
Essential Python Cheat Sheet Guide
11 pages
Python 3 Language and Libraries Guide
No ratings yet
Python 3 Language and Libraries Guide
15 pages
Python Programming Fundamentals Guide
No ratings yet
Python Programming Fundamentals Guide
34 pages
Python Programming Fundamentals Guide
No ratings yet
Python Programming Fundamentals Guide
26 pages
3 Inceptez PYTHON
No ratings yet
3 Inceptez PYTHON
33 pages
Ultimate Python Programming Guide
100% (1)
Ultimate Python Programming Guide
10 pages
Python Programming Basics and Libraries
No ratings yet
Python Programming Basics and Libraries
16 pages
Py From Bas To Adv
No ratings yet
Py From Bas To Adv
74 pages
Advanced Python Programming Guide
No ratings yet
Advanced Python Programming Guide
3 pages
Python Ai Guide
No ratings yet
Python Ai Guide
19 pages
Python Machine Learning Cheat Sheet
100% (1)
Python Machine Learning Cheat Sheet
45 pages
Python Concepts: GIL, Lambda, Decorators
No ratings yet
Python Concepts: GIL, Lambda, Decorators
5 pages
Understanding Python Functions
No ratings yet
Understanding Python Functions
14 pages
Python Programming Basics and Features
No ratings yet
Python Programming Basics and Features
11 pages
Python Basics Cheat Sheet
No ratings yet
Python Basics Cheat Sheet
8 pages
Computer Science Study Guide Basics
No ratings yet
Computer Science Study Guide Basics
29 pages
Unit 1
No ratings yet
Unit 1
26 pages
Python Programming Concepts Explained
No ratings yet
Python Programming Concepts Explained
15 pages
Intro2com Chap05 QH
No ratings yet
Intro2com Chap05 QH
27 pages
Python Cheat Codes
No ratings yet
Python Cheat Codes
9 pages
Python Interview Prep: Basics to Advanced
No ratings yet
Python Interview Prep: Basics to Advanced
22 pages
Dr. Khawar Khurshid's Python Lab Guide
No ratings yet
Dr. Khawar Khurshid's Python Lab Guide
15 pages
History and Features of Python Language
50% (2)
History and Features of Python Language
90 pages
Python
No ratings yet
Python
8 pages
Python Programming Basics for EDA
No ratings yet
Python Programming Basics for EDA
152 pages
Python Interpreter and Data Structures Lab Manual
No ratings yet
Python Interpreter and Data Structures Lab Manual
13 pages
Python Basics: Data Types & Functions
No ratings yet
Python Basics: Data Types & Functions
22 pages
Python Memory Management and Basics
No ratings yet
Python Memory Management and Basics
19 pages
Functional Programming Languages Overview
No ratings yet
Functional Programming Languages Overview
10 pages
AI Lab Manual: Python Basics
100% (1)
AI Lab Manual: Python Basics
123 pages
Python Coding Best Practices Guide
No ratings yet
Python Coding Best Practices Guide
8 pages
AI Programming with Python Guide
No ratings yet
AI Programming with Python Guide
16 pages
Understanding Python's Platform Independence
No ratings yet
Understanding Python's Platform Independence
98 pages
Lec 5
No ratings yet
Lec 5
48 pages
Introduction to Pythonics Concepts
No ratings yet
Introduction to Pythonics Concepts
144 pages
10 Tough Python Concepts Explained
No ratings yet
10 Tough Python Concepts Explained
24 pages
Python Functions: A Comprehensive Guide
No ratings yet
Python Functions: A Comprehensive Guide
5 pages
02 Python Programming Basics
No ratings yet
02 Python Programming Basics
10 pages
Python Functions and Concepts Guide
No ratings yet
Python Functions and Concepts Guide
37 pages
Comprehensive Guide to Python Concepts
No ratings yet
Comprehensive Guide to Python Concepts
73 pages
Class XII Informatics Practices Exam 2025-26
No ratings yet
Class XII Informatics Practices Exam 2025-26
14 pages
Diwali Sales Analysis for SMEs
No ratings yet
Diwali Sales Analysis for SMEs
7 pages
Healthcare Management System Overview
No ratings yet
Healthcare Management System Overview
3 pages
Amazon Sales Data Analysis Insights
No ratings yet
Amazon Sales Data Analysis Insights
14 pages
Student Performance Prediction - Ipynb - Colab
No ratings yet
Student Performance Prediction - Ipynb - Colab
4 pages
Class 10 AI Practical File 2025-26
70% (10)
Class 10 AI Practical File 2025-26
15 pages
Pandas and Matplotlib for Data Analysis
100% (3)
Pandas and Matplotlib for Data Analysis
72 pages
Class XII Informatics Practices Pre Board Exam 2024-25
No ratings yet
Class XII Informatics Practices Pre Board Exam 2024-25
8 pages
Leaf Disease Detection with ML
No ratings yet
Leaf Disease Detection with ML
74 pages
Data Cleaning and Transformation in Pandas
No ratings yet
Data Cleaning and Transformation in Pandas
30 pages
End-to-End Fintech ETL Pipeline Documentation
No ratings yet
End-to-End Fintech ETL Pipeline Documentation
9 pages
BCA Python Lab Manual: Clean Coding
No ratings yet
BCA Python Lab Manual: Clean Coding
21 pages
LSTM Sentiment Analysis in Python
No ratings yet
LSTM Sentiment Analysis in Python
5 pages
Deep Dive into Quant Finance with Python
100% (1)
Deep Dive into Quant Finance with Python
198 pages
PDS Previous Year Question Papers
No ratings yet
PDS Previous Year Question Papers
9 pages
12 Ippb 32025
No ratings yet
12 Ippb 32025
9 pages
Data Handling with Python Pandas
No ratings yet
Data Handling with Python Pandas
2 pages
PWP Practical Exam Questions for Python
No ratings yet
PWP Practical Exam Questions for Python
4 pages
Python for Data Science Course Overview
No ratings yet
Python for Data Science Course Overview
2 pages
Essential Tools for Stock Analysis
No ratings yet
Essential Tools for Stock Analysis
10 pages
Data Science Intern at DAVV Indore
No ratings yet
Data Science Intern at DAVV Indore
1 page
Data Science Certification Course Overview
No ratings yet
Data Science Certification Course Overview
21 pages
Intro To Pandas - Ipynb Colab
No ratings yet
Intro To Pandas - Ipynb Colab
8 pages
Actuarise Training Programs Overview
No ratings yet
Actuarise Training Programs Overview
12 pages
GFG Placement Preparation Guide
No ratings yet
GFG Placement Preparation Guide
24 pages
CampusX 100 Days of Machine Learning
No ratings yet
CampusX 100 Days of Machine Learning
2 pages
Python Data Science Bootcamp Overview
No ratings yet
Python Data Science Bootcamp Overview
13 pages
Pandas Basics Project Exercises
No ratings yet
Pandas Basics Project Exercises
5 pages
Linux Basics Docker Fundamentals
No ratings yet
Linux Basics Docker Fundamentals
16 pages
Python Data Science Learning Path
No ratings yet
Python Data Science Learning Path
13 pages