🐍 Python Complete Sprint
for Data Science & AI
A complete recall guide — from basics to pandas, OOP to ML
Simple English • Real Examples • Everything Included
SECTION 1: Variables & Data Types
1.1 What is a Variable?
Think of a variable like a labelled box. You put something inside the box (a number, a word, etc.) and
you give the box a name so you can find it later.
Why do we use variables? So we can store information and use it again and again without retyping it.
It makes our code clean and easy to change.
Real world example: A hospital stores a patient's name, age, and blood group in variables. When the
doctor needs this info, the program just reads the variable.
Syntax
📝 Code:
# variable_name = value
name = 'Arjun' # stores text
age = 10 # stores a number
height = 4.5 # stores a decimal
is_student = True # stores True/False
print(name)
print(age)
print(height)
print(is_student)
💻 Output:
Arjun
10
4.5
True
Rules for naming variables
• Must start with a letter or underscore (_), not a number
• Can contain letters, numbers, underscores
• Cannot use Python keywords like if, for, while, class
• Python is case-sensitive: age ≠ Age ≠ AGE
1.2 Data Types
Every value in Python has a type. Like in real life — a word is different from a number, and Python
needs to know what kind of data it is dealing with.
Data Type What it stores / Example
int Whole numbers: age = 25, count = 100
float Decimal numbers: price = 99.99, pi = 3.14
str Text/words: name = 'Riya', city = 'Hyderabad'
bool True or False only: is_alive = True
list Multiple items in order: [1, 2, 3]
tuple Like list but cannot change: (10, 20, 30)
dict Key-value pairs: {'name': 'Ram', 'age': 12}
set Unique items, no duplicates: {1, 2, 3}
NoneType No value / empty: result = None
How to check the type of a variable?
📝 Code:
x = 42
y = 3.14
z = 'hello'
a = True
print(type(x)) # int
print(type(y)) # float
print(type(z)) # str
print(type(a)) # bool
💻 Output:
<class 'int'>
<class 'float'>
<class 'str'>
<class 'bool'>
1.3 Type Conversion (Casting)
Sometimes you want to change the type. Like turning the text '10' into the number 10 so you can do
math with it.
📝 Code:
# int() — convert to integer
x = int('42') # '42' (string) → 42 (int)
y = int(3.9) # 3.9 (float) → 3 (truncates!)
# float() — convert to float
a = float('3.14') # '3.14' (string) → 3.14
b = float(5) # 5 (int) → 5.0
# str() — convert to string
s = str(100) # 100 → '100'
s2 = str(3.14) # 3.14 → '3.14'
# bool() — convert to boolean
print(bool(0)) # False (0 is always False)
print(bool(1)) # True
print(bool('')) # False (empty string is False)
print(bool('hi')) # True
# list(), tuple(), set()
print(list('abc')) # ['a', 'b', 'c']
print(tuple([1,2])) # (1, 2)
print(set([1,1,2])) # {1, 2} — removes duplicates!
💻 Output:
False
True
False
True
['a', 'b', 'c']
(1, 2)
{1, 2}
SECTION 2: Strings
2.1 What is a String?
A string is any text — words, sentences, or even a single letter. In Python, we put text inside single
quotes '' or double quotes "".
Real world use: Storing names, addresses, messages, passwords, URLs, file names.
Creating Strings
📝 Code:
name = 'Priya'
city = "Hyderabad"
multi = '''This is
a multi-line
string.'''
empty = ''
print(name, city)
print(len(name)) # length of string
💻 Output:
Priya Hyderabad
5
2.2 String Operations
📝 Code:
a = 'Hello'
b = 'World'
# Concatenation (joining)
print(a + ' ' + b) # Hello World
# Repetition
print(a * 3) # HelloHelloHello
# Indexing (getting one character)
print(a[0]) # H (first character)
print(a[-1]) # o (last character)
# Slicing (getting part of a string)
print(a[1:4]) # ell
print(a[:3]) # Hel
print(a[2:]) # llo
print(a[::-1]) # olleH (reversed!)
💻 Output:
Hello World
HelloHelloHello
H
o
ell
Hel
llo
olleH
2.3 String Methods (Built-in Functions for Strings)
Python has tons of ready-made tools (methods) to work with strings. Here are all the important ones:
Method What it does + Example
upper() 'hello'.upper() → 'HELLO'
lower() 'HELLO'.lower() → 'hello'
title() 'hello world'.title() → 'Hello World'
strip() ' hi '.strip() → 'hi' (removes spaces)
lstrip() ' hi'.lstrip() → 'hi' (left side)
rstrip() 'hi '.rstrip() → 'hi' (right side)
replace(old,new) 'cat'.replace('c','b') → 'bat'
split(sep) 'a,b,c'.split(',') → ['a','b','c']
join(list) ','.join(['a','b']) → 'a,b'
find(x) 'hello'.find('l') → 2 (index of first match)
count(x) 'hello'.count('l') → 2
startswith(x) 'hello'.startswith('he') → True
endswith(x) 'hello'.endswith('lo') → True
isdigit() '123'.isdigit() → True
isalpha() 'abc'.isalpha() → True
isalnum() 'abc123'.isalnum() → True
isupper() 'ABC'.isupper() → True
islower() 'abc'.islower() → True
center(n) 'hi'.center(10) → ' hi '
ljust(n) 'hi'.ljust(10) → 'hi '
rjust(n) 'hi'.rjust(10) → ' hi'
zfill(n) '42'.zfill(5) → '00042'
format() '{} is {}'.format('sky','blue') → 'sky is blue'
encode() 'hello'.encode() → b'hello'
in keyword 'el' in 'hello' → True
2.4 f-Strings (The modern way to format)
f-strings let you put variables directly inside a string. Just add f before the quote and use {} to insert
variables.
📝 Code:
name = 'Ravi'
age = 14
score = 95.678
print(f'My name is {name} and I am {age} years old.')
print(f'Score: {score:.2f}') # 2 decimal places
print(f'2 + 2 = {2 + 2}') # math inside {}
print(f'{[Link]()} rocks!')
💻 Output:
My name is Ravi and I am 14 years old.
Score: 95.68
2 + 2 = 4
RAVI rocks!
SECTION 3: Operators
3.1 Arithmetic Operators
These do math — just like a calculator.
📝 Code:
a = 10
b = 3
print(a + b) # Addition: 13
print(a - b) # Subtraction: 7
print(a * b) # Multiplication: 30
print(a / b) # Division: 3.333...
print(a // b) # Floor division: 3 (no decimal)
print(a % b) # Modulus: 1 (remainder)
print(a ** b) # Power: 1000 (10^3)
💻 Output:
13
7
30
3.3333333333333335
3
1
1000
3.2 Comparison Operators
These compare two values and give True or False.
📝 Code:
a = 10
b = 5
print(a == b) # Equal? False
print(a != b) # Not equal? True
print(a > b) # Greater than? True
print(a < b) # Less than? False
print(a >= b) # Greater or equal? True
print(a <= b) # Less or equal? False
💻 Output:
False
True
True
False
True
False
3.3 Logical Operators
These combine multiple conditions.
📝 Code:
x = 10
# and — BOTH must be True
print(x > 5 and x < 20) # True
print(x > 5 and x > 20) # False
# or — at LEAST ONE must be True
print(x > 5 or x > 20) # True
print(x < 5 or x > 20) # False
# not — flips True to False and vice versa
print(not (x > 5)) # False
💻 Output:
True
False
True
False
False
3.4 Assignment Operators
📝 Code:
x = 10
x += 5 # same as x = x + 5 → x is now 15
x -= 3 # same as x = x - 3 → x is now 12
x *= 2 # same as x = x * 2 → x is now 24
x //= 5 # same as x = x // 5 → x is now 4
x **= 3 # same as x = x ** 3 → x is now 64
print(x)
💻 Output:
64
3.5 Identity & Membership Operators
📝 Code:
# 'is' — checks if same object in memory
a = [1, 2, 3]
b = a
c = [1, 2, 3]
print(a is b) # True — b is literally a
print(a is c) # False — same content, different object
# 'in' — checks if value is inside
fruits = ['apple', 'mango', 'banana']
print('mango' in fruits) # True
print('grape' not in fruits) # True
💻 Output:
True
False
True
True
SECTION 4: Input & Output
4.1 print() — Showing Output
print() displays anything on the screen. It is the most used function in Python.
📝 Code:
print('Hello!') # basic
print('Name:', 'Arjun') # multiple items
print(10 + 5) # expression
print('A', 'B', 'C', sep='-') # custom separator
print('Loading', end='...') # no newline
print(' Done') # continues same line
# Formatted output
name = 'Priya'
print(f'Hello {name}!')
💻 Output:
Hello!
Name: Arjun
15
A-B-C
Loading... Done
Hello Priya!
4.2 input() — Taking User Input
input() pauses the program and waits for the user to type something. It ALWAYS returns a string.
name = input('What is your name? ') # user types: Ravi
age = int(input('How old are you? ')) # user types: 15
print(f'Hello {name}, you are {age} years old!')
# Taking multiple inputs in one line
x, y = input('Enter two numbers: ').split() # user types: 10 20
x, y = int(x), int(y)
print(f'Sum = {x + y}')
⚠️ Important: input() always gives you a STRING.
If you need a number, convert it: int(input(...)) or float(input(...))
SECTION 5: Conditionals (if / elif / else)
5.1 What are Conditionals?
Conditionals let your program make decisions. Just like in real life: IF it is raining, take an umbrella,
ELSE go without one.
Real world use: Login systems check IF username and password match. A shopping cart checks IF
the item is in stock.
Syntax
if condition:
# code runs if condition is True
elif another_condition:
# code runs if the first was False but this is True
else:
# code runs if ALL above conditions were False
Example — Grade Checker
📝 Code:
marks = 78
if marks >= 90:
print('Grade: A+')
elif marks >= 80:
print('Grade: A')
elif marks >= 70:
print('Grade: B')
elif marks >= 60:
print('Grade: C')
else:
print('Grade: F — Study harder!')
💻 Output:
Grade: B
5.2 Nested if
📝 Code:
age = 20
has_id = True
if age >= 18:
if has_id:
print('Entry allowed')
else:
print('Show your ID')
else:
print('Too young!')
💻 Output:
Entry allowed
5.3 One-Line if (Ternary)
📝 Code:
# result = value_if_true if condition else value_if_false
age = 20
status = 'Adult' if age >= 18 else 'Minor'
print(status)
💻 Output:
Adult
SECTION 6: Loops
6.1 for Loop
A for loop repeats code a set number of times or for each item in a collection.
Real world use: Sending the same email to 1000 customers. Checking each product in a shopping
cart.
📝 Code:
# Loop through a list
fruits = ['apple', 'mango', 'banana']
for fruit in fruits:
print('I like', fruit)
# Loop through a range
for i in range(5): # 0, 1, 2, 3, 4
print(i, end=' ')
print()
# range(start, stop, step)
for i in range(1, 10, 2): # 1, 3, 5, 7, 9
print(i, end=' ')
💻 Output:
I like apple
I like mango
I like banana
0 1 2 3 4
1 3 5 7 9
6.2 while Loop
A while loop keeps running AS LONG AS a condition is True. Use when you don't know how many
times to repeat.
📝 Code:
count = 1
while count <= 5:
print(f'Count is {count}')
count += 1 # IMPORTANT: must change count or it loops forever!
print('Done!')
💻 Output:
Count is 1
Count is 2
Count is 3
Count is 4
Count is 5
Done!
6.3 break, continue, pass
📝 Code:
# break — exit the loop immediately
for i in range(10):
if i == 5:
break
print(i, end=' ') # prints: 0 1 2 3 4
print()
# continue — skip this one, go to next
for i in range(10):
if i % 2 == 0: # if even, skip
continue
print(i, end=' ') # prints: 1 3 5 7 9
print()
# pass — do nothing (placeholder)
for i in range(3):
pass # loop runs but does nothing
💻 Output:
0 1 2 3 4
1 3 5 7 9
6.4 enumerate() — Loop with index
📝 Code:
fruits = ['apple', 'mango', 'banana']
for index, fruit in enumerate(fruits):
print(f'{index}: {fruit}')
# Start counting from 1
for index, fruit in enumerate(fruits, start=1):
print(f'{index}. {fruit}')
💻 Output:
0: apple
1: mango
2: banana
1. apple
2. mango
3. banana
6.5 zip() — Loop over two lists together
📝 Code:
names = ['Ravi', 'Priya', 'Arjun']
scores = [85, 92, 78]
for name, score in zip(names, scores):
print(f'{name} scored {score}')
💻 Output:
Ravi scored 85
Priya scored 92
Arjun scored 78
6.6 List Comprehension — The Smart Loop
A shorter way to create lists using a loop in a single line.
📝 Code:
# Normal way
squares = []
for i in range(6):
[Link](i ** 2)
# List comprehension (same result!)
squares = [i ** 2 for i in range(6)]
print(squares)
# With condition
evens = [i for i in range(20) if i % 2 == 0]
print(evens)
# Nested comprehension
matrix = [[i * j for j in range(1, 4)] for i in range(1, 4)]
print(matrix)
💻 Output:
[0, 1, 4, 9, 16, 25]
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
[[1, 2, 3], [2, 4, 6], [3, 6, 9]]
SECTION 7: Functions
7.1 What is a Function?
A function is a reusable block of code that does a specific job. Instead of writing the same code again
and again, you write it once in a function and call it whenever you need it.
Real world use: A payment function runs every time a user buys something. A login function runs
every time someone logs in.
Defining and Calling a Function
📝 Code:
def greet(name): # define the function
print(f'Hello, {name}!')
greet('Ravi') # call the function
greet('Priya') # call again with different name
💻 Output:
Hello, Ravi!
Hello, Priya!
7.2 Return Values
A function can send back a result using 'return'.
📝 Code:
def add(a, b):
return a + b
result = add(10, 5)
print(result)
# Returning multiple values
def min_max(numbers):
return min(numbers), max(numbers)
lo, hi = min_max([3, 1, 7, 4, 9])
print(f'Min: {lo}, Max: {hi}')
💻 Output:
15
Min: 1, Max: 9
7.3 Default & Keyword Arguments
📝 Code:
# Default argument — used if not provided
def greet(name, greeting='Hello'):
print(f'{greeting}, {name}!')
greet('Arjun') # uses default
greet('Arjun', 'Namaste') # overrides default
# Keyword arguments — pass by name (order doesn't matter)
def power(base, exp):
return base ** exp
print(power(exp=3, base=2)) # 8
💻 Output:
Hello, Arjun!
Namaste, Arjun!
8
7.4 *args and **kwargs
When you don't know how many arguments will be passed, use *args (for a list of values) or **kwargs
(for key=value pairs).
📝 Code:
# *args — variable number of arguments
def total(*nums):
return sum(nums)
print(total(1, 2, 3)) # 6
print(total(10, 20, 30, 40)) # 100
# **kwargs — keyword arguments
def show_info(**details):
for key, value in [Link]():
print(f'{key}: {value}')
show_info(name='Priya', age=15, city='Hyderabad')
💻 Output:
6
100
name: Priya
age: 15
city: Hyderabad
7.5 Lambda Functions (Anonymous Functions)
A lambda is a tiny, one-line function. Use it when the function is simple and you only need it once.
📝 Code:
# Normal function
def square(x):
return x ** 2
# Same thing as lambda
square = lambda x: x ** 2
print(square(5)) # 25
# Used with sorted()
students = [('Ravi', 85), ('Priya', 92), ('Arjun', 78)]
[Link](key=lambda s: s[1]) # sort by score
print(students)
💻 Output:
25
[('Arjun', 78), ('Ravi', 85), ('Priya', 92)]
7.6 Scope — Where Variables Live
📝 Code:
x = 100 # global variable
def show():
y = 50 # local variable
print(x) # can read global
print(y)
show()
# print(y) # ERROR! y doesn't exist outside
# To MODIFY a global inside a function
counter = 0
def increment():
global counter
counter += 1
increment()
print(counter) # 1
💻 Output:
100
50
1
SECTION 8: Lists
8.1 What is a List?
A list is like a shopping bag — you can put many items in it, in order. Lists are the most used data
structure in Python.
Real world use: Storing a student's marks, a playlist of songs, prices of items in a cart.
Creating Lists
📝 Code:
numbers = [1, 2, 3, 4, 5]
names = ['Ravi', 'Priya', 'Arjun']
mixed = [1, 'hello', 3.14, True] # can mix types
empty = []
nested = [[1, 2], [3, 4], [5, 6]] # list inside list
print(numbers[0]) # first item: 1
print(numbers[-1]) # last item: 5
print(numbers[1:4]) # slice: [2, 3, 4]
💻 Output:
1
5
[2, 3, 4]
8.2 All List Methods
Method What it does
append(x) Adds x to the end: [1,2].append(3) → [1,2,3]
insert(i, x) Inserts x at index i: [1,3].insert(1,2) → [1,2,3]
extend(lst) Adds all items from lst: [1].extend([2,3]) → [1,2,3]
remove(x) Removes first x: [1,2,2].remove(2) → [1,2]
pop() Removes & returns last item
pop(i) Removes & returns item at index i
del list[i] Deletes item at index i
clear() Empties the list: []
index(x) Returns index of first x
count(x) How many times x appears
sort() Sorts in ascending order (changes list)
sort(reverse=True) Sorts in descending order
sorted(lst) Returns sorted copy (original unchanged)
reverse() Reverses the list in place
copy() Returns a copy of the list
len(lst) Number of items
min(lst) Smallest item
max(lst) Largest item
sum(lst) Total of all numbers
list() Create list from other iterable
in / not in Check if value exists: 3 in [1,2,3] → True
Full Example
📝 Code:
marks = [78, 92, 85, 67, 91, 88]
[Link](95) # add new mark
[Link]() # sort ascending
print(marks)
print('Highest:', max(marks))
print('Lowest: ', min(marks))
print('Average:', sum(marks) / len(marks))
[Link](67) # remove the lowest
print('After removal:', marks)
# Find top 3
top3 = sorted(marks, reverse=True)[:3]
print('Top 3:', top3)
💻 Output:
[67, 78, 85, 88, 91, 92, 95]
Highest: 95
Lowest: 67
Average: 85.14285714285714
After removal: [78, 85, 88, 91, 92, 95]
Top 3: [95, 92, 91]
SECTION 9: Tuples, Sets & Dictionaries
9.1 Tuples
A tuple is like a list but it CANNOT be changed after creation. Use tuples for data that should stay fixed
— like coordinates, RGB colors, or date of birth.
📝 Code:
coords = (10.5, 17.3) # latitude, longitude
rgb = (255, 128, 0) # color
person = ('Priya', 25, 'F') # name, age, gender
print(coords[0]) # 10.5
print(len(person)) # 3
# Unpacking
name, age, gender = person
print(f'{name} is {age} years old')
# count and index are the main tuple methods
t = (1, 2, 2, 3, 2)
print([Link](2)) # 3
print([Link](3)) # 3 (index of value 3)
💻 Output:
10.5
3
Priya is 25 years old
3
3
9.2 Sets
A set stores UNIQUE items only — no duplicates allowed! Sets are great for removing duplicates and
checking membership.
📝 Code:
fruits = {'apple', 'mango', 'banana', 'mango'} # mango appears only once
print(fruits)
# Set operations (great for comparing groups!)
a = {1, 2, 3, 4, 5}
b = {4, 5, 6, 7, 8}
print(a | b) # Union (all unique): {1,2,3,4,5,6,7,8}
print(a & b) # Intersection (common): {4, 5}
print(a - b) # Difference (in a not b): {1,2,3}
print(a ^ b) # Symmetric diff (not in both): {1,2,3,6,7,8}
# Remove duplicates from a list
nums = [1,1,2,2,3,3,4]
unique = list(set(nums))
print(unique)
💻 Output:
{'apple', 'mango', 'banana'}
{1, 2, 3, 4, 5, 6, 7, 8}
{4, 5}
{1, 2, 3}
{1, 2, 3, 6, 7, 8}
[1, 2, 3, 4]
9.3 Dictionaries
A dictionary stores data as key-value pairs — like a real dictionary where you look up a word (key) to
find its meaning (value).
Real world use: Student records, product catalogs, user profiles, JSON data from APIs.
📝 Code:
student = {
'name': 'Ravi',
'age': 16,
'marks': [85, 92, 78],
'city': 'Hyderabad'
}
# Accessing values
print(student['name']) # Ravi
print([Link]('age')) # 16
print([Link]('grade', 'N/A')) # N/A (default if not found)
# Adding / updating
student['grade'] = 'A'
student['age'] = 17
# Deleting
del student['city']
# Looping
for key, value in [Link]():
print(f'{key}: {value}')
💻 Output:
Ravi
16
N/A
name: Ravi
age: 17
marks: [85, 92, 78]
grade: A
All Dictionary Methods
Method What it does
dict[key] Get value — raises KeyError if missing
[Link](key, default) Get value — returns default if missing
dict[key] = val Add or update a key
[Link]({...}) Merge another dict in
del dict[key] Delete a key
[Link](key) Remove and return value
[Link]() Remove and return last item as (key, value)
[Link]() All keys
[Link]() All values
[Link]() All (key, value) pairs
key in dict Check if key exists
[Link]() Shallow copy
[Link]() Empty the dictionary
len(dict) Number of key-value pairs
[Link](k, v) Set key only if it doesn't exist
Dictionary Comprehension
📝 Code:
# Square of each number
squares = {x: x**2 for x in range(6)}
print(squares)
# Filter: only even-squared
even_sq = {x: x**2 for x in range(6) if x % 2 == 0}
print(even_sq)
💻 Output:
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
{0: 0, 2: 4, 4: 16}
SECTION 10: File Handling
10.1 Opening & Reading Files
Python can read from and write to files on your computer. Always use 'with open()' — it automatically
closes the file even if an error occurs.
Mode What it does
'r' Read only (default). File must exist.
'w' Write — creates new or OVERWRITES existing file
'a' Append — adds to end of existing file
'x' Create — creates new file; error if exists
'rb' Read binary (images, PDFs)
'wb' Write binary
📝 Code:
# Writing to a file
with open('[Link]', 'w') as f:
[Link]('Ravi,85\n')
[Link]('Priya,92\n')
[Link]('Arjun,78\n')
# Reading entire file
with open('[Link]', 'r') as f:
content = [Link]()
print(content)
# Reading line by line
with open('[Link]', 'r') as f:
for line in f:
name, score = [Link]().split(',')
print(f'{name} scored {score}')
💻 Output:
Ravi,85
Priya,92
Arjun,78
Ravi scored 85
Priya scored 92
Arjun scored 78
SECTION 11: Exception Handling (try / except)
11.1 What is an Exception?
An exception is an error that happens while your program is running. Without exception handling, your
program would crash. With it, you can handle the error gracefully.
Real world use: Banking apps catch errors when you enter wrong PIN. Web apps catch errors when a
server is down.
📝 Code:
try:
num = int(input('Enter a number: ')) # user types: abc
print(10 / num)
except ValueError:
print('That is not a valid number!')
except ZeroDivisionError:
print('Cannot divide by zero!')
except Exception as e:
print(f'Something went wrong: {e}')
else:
print('Everything worked fine!')
finally:
print('This always runs!')
💻 Output:
That is not a valid number!
This always runs!
11.2 Common Exception Types
Exception When it happens
ValueError Wrong type given: int('abc')
TypeError Wrong operation type: 'a' + 1
ZeroDivisionError Dividing by zero: 10 / 0
IndexError List index out of range: [1,2][5]
KeyError Dict key doesn't exist: d['xyz']
FileNotFoundError File doesn't exist
AttributeError Object has no such attribute
NameError Variable not defined
ImportError Module not found
OverflowError Number too large
RecursionError Too many recursive calls
11.3 Raising Your Own Exceptions
📝 Code:
def check_age(age):
if age < 0:
raise ValueError('Age cannot be negative!')
if age > 150:
raise ValueError('That age is not realistic!')
return f'Age {age} is valid.'
try:
print(check_age(-5))
except ValueError as e:
print(f'Error: {e}')
💻 Output:
Error: Age cannot be negative!
SECTION 12: Object-Oriented Programming (OOP)
OOP is a way of writing code that groups related data and functions together into 'objects'.
Think of it like building blocks. Each block (object) knows what it is and what it can do.
OOP has 4 main pillars: Encapsulation, Inheritance, Polymorphism, Abstraction.
12.1 Classes and Objects
A CLASS is like a blueprint. An OBJECT is the actual thing built from that blueprint.
Real world: Class = Car blueprint. Objects = your actual car, your friend's car.
📝 Code:
class Dog: # define the class
# __init__ is the constructor — runs when object is created
def __init__(self, name, breed, age):
[Link] = name # self refers to THIS dog
[Link] = breed
[Link] = age
def bark(self):
print(f'{[Link]} says: Woof!')
def info(self):
print(f'{[Link]} is a {[Link]}, {[Link]} years old.')
# Creating objects
dog1 = Dog('Bruno', 'Labrador', 3)
dog2 = Dog('Max', 'Pug', 5)
[Link]()
[Link]()
[Link]()
print([Link]) # access attribute
💻 Output:
Bruno says: Woof!
Bruno is a Labrador, 3 years old.
Max says: Woof!
Max
12.2 Encapsulation — Keeping Data Safe
Encapsulation means hiding the internal details and only showing what is necessary. In Python, we use
underscores to signal privacy.
Real world: ATM — you press a button and get money. You cannot see the internal code that
processes it.
📝 Code:
class BankAccount:
def __init__(self, owner, balance):
[Link] = owner
self.__balance = balance # __ makes it private
def deposit(self, amount):
if amount > 0:
self.__balance += amount
print(f'Deposited ₹{amount}')
def withdraw(self, amount):
if amount > self.__balance:
print('Insufficient funds!')
else:
self.__balance -= amount
print(f'Withdrawn ₹{amount}')
def get_balance(self):
return self.__balance # controlled access
acc = BankAccount('Ravi', 5000)
[Link](2000)
[Link](8000)
print(f'Balance: ₹{acc.get_balance()}')
# acc.__balance # ERROR! Cannot access directly
💻 Output:
Deposited ₹2000
Insufficient funds!
Balance: ₹7000
12.3 Inheritance — Reusing Code
Inheritance lets a child class automatically get all the features of a parent class. No need to rewrite
everything!
Real world: Animal → Dog, Cat, Bird. Vehicle → Car, Bus, Truck.
📝 Code:
class Animal: # parent class
def __init__(self, name):
[Link] = name
def eat(self):
print(f'{[Link]} is eating.')
def speak(self):
print('...')
class Dog(Animal): # child class inherits Animal
def speak(self): # overrides parent method
print(f'{[Link]} says: Woof!')
class Cat(Animal): # another child class
def speak(self):
print(f'{[Link]} says: Meow!')
def purr(self): # new method only for Cat
print(f'{[Link]} purrs...')
d = Dog('Bruno')
c = Cat('Whiskers')
[Link]() # inherited from Animal
[Link]() # Dog's own version
[Link]() # inherited from Animal
[Link]() # Cat's own version
[Link]() # only for cats
💻 Output:
Bruno is eating.
Bruno says: Woof!
Whiskers is eating.
Whiskers says: Meow!
Whiskers purrs...
super() — Calling the Parent's __init__
📝 Code:
class Animal:
def __init__(self, name, sound):
[Link] = name
[Link] = sound
class Dog(Animal):
def __init__(self, name, breed):
super().__init__(name, 'Woof') # calls Animal's __init__
[Link] = breed
def info(self):
print(f'{[Link]} ({[Link]}) says {[Link]}')
d = Dog('Bruno', 'Labrador')
[Link]()
💻 Output:
Bruno (Labrador) says Woof
12.4 Polymorphism — Same Name, Different Behaviour
Polymorphism means 'many forms'. The same method name does different things in different classes.
Real world: A 'draw()' method on a Circle draws a circle; on a Square it draws a square. Same name,
different result.
📝 Code:
class Shape:
def area(self):
return 0
class Circle(Shape):
def __init__(self, r):
self.r = r
def area(self):
return 3.14 * self.r ** 2
class Rectangle(Shape):
def __init__(self, w, h):
self.w = w
self.h = h
def area(self):
return self.w * self.h
shapes = [Circle(7), Rectangle(4, 5), Circle(3)]
for shape in shapes:
print(f'{shape.__class__.__name__}: area = {[Link]()}')
💻 Output:
Circle: area = 153.86
Rectangle: area = 20
Circle: area = 28.26
12.5 Abstraction — Hiding Complexity
Abstraction means showing only what is necessary and hiding the inner details. We use abstract
classes to force child classes to implement certain methods.
📝 Code:
from abc import ABC, abstractmethod
class Vehicle(ABC): # abstract class
@abstractmethod
def start(self): # must be implemented by child
pass
def stop(self):
print('Stopping...')
class Car(Vehicle):
def start(self):
print('Car started with key')
class Bike(Vehicle):
def start(self):
print('Bike started with kick')
c = Car()
[Link]()
[Link]()
b = Bike()
[Link]()
# v = Vehicle() # ERROR! Cannot create abstract class object
💻 Output:
Car started with key
Stopping...
Bike started with kick
12.6 Special (Dunder) Methods
These are special methods with double underscores. They let your objects work with Python's built-in
operations like print(), len(), +, etc.
📝 Code:
class Student:
def __init__(self, name, marks):
[Link] = name
[Link] = marks
def __str__(self): # called by print()
return f'Student: {[Link]}'
def __repr__(self): # developer representation
return f'Student({[Link]!r}, {[Link]})'
def __len__(self): # called by len()
return len([Link])
def __add__(self, other): # called by +
combined = [Link] + [Link]
return Student([Link] + '+' + [Link], combined)
def __lt__(self, other): # called by <
return sum([Link]) < sum([Link])
s1 = Student('Ravi', [85, 92, 78])
s2 = Student('Priya', [90, 88, 95])
print(s1) # calls __str__
print(len(s1)) # calls __len__
print(s1 < s2) # calls __lt__
merged = s1 + s2 # calls __add__
print([Link])
💻 Output:
Student: Ravi
3
True
Ravi+Priya
12.7 Class Methods & Static Methods
📝 Code:
class MathHelper:
pi = 3.14159 # class variable (shared)
def __init__(self, value):
[Link] = value
# classmethod — works with class, not instance
@classmethod
def circle_area(cls, r):
return [Link] * r ** 2
# staticmethod — no self or cls
@staticmethod
def is_even(n):
return n % 2 == 0
print(MathHelper.circle_area(5))
print(MathHelper.is_even(10))
print(MathHelper.is_even(7))
💻 Output:
78.53975
True
False
SECTION 13: Built-in Functions & Important Modules
13.1 All Important Built-in Functions
Function What it does + Example
print(x) Displays x on screen
input(prompt) Takes text input from user
len(x) Length of string/list/dict: len([1,2,3]) → 3
type(x) Type of x: type(3.14) → float
int(x) Convert to integer: int('5') → 5
float(x) Convert to float: float(3) → 3.0
str(x) Convert to string: str(42) → '42'
bool(x) Convert to bool: bool(0) → False
list(x) Convert to list: list('abc') → ['a','b','c']
tuple(x) Convert to tuple: tuple([1,2]) → (1,2)
set(x) Convert to set (removes duplicates)
dict() Create empty dict or from pairs
range(n) Creates range 0 to n-1
range(a,b,step) Range from a to b (step size)
enumerate(lst) Returns (index, value) pairs
zip(a, b) Pairs items from two iterables
map(fn, lst) Apply function to each item
filter(fn, lst) Keep items where function returns True
sorted(x) Returns sorted copy
reversed(x) Returns reversed iterator
sum(lst) Sum of numbers: sum([1,2,3]) → 6
min(lst) Smallest item: min([3,1,4]) → 1
max(lst) Largest item: max([3,1,4]) → 4
abs(x) Absolute value: abs(-5) → 5
round(x, n) Round to n decimal places: round(3.14159, 2) →
3.14
pow(x, y) Power: pow(2, 10) → 1024
divmod(a, b) Returns (quotient, remainder): divmod(10,3) →
(3,1)
hash(x) Hash value of object
id(x) Memory address of object
isinstance(x, T) True if x is of type T
issubclass(A, B) True if A is subclass of B
hasattr(obj, 'name') True if obj has attribute 'name'
getattr(obj,'name') Get attribute by name string
setattr(obj,'n',v) Set attribute by name string
delattr(obj,'name') Delete attribute
dir(x) List all attributes/methods of x
vars(obj) Returns object's __dict__
help(x) Show documentation for x
callable(x) True if x can be called as function
open(file, mode) Open a file
chr(n) Character for ASCII: chr(65) → 'A'
ord(c) ASCII number: ord('A') → 65
hex(n) Hex string: hex(255) → '0xff'
oct(n) Octal string: oct(8) → '0o10'
bin(n) Binary string: bin(5) → '0b101'
format(v, spec) Format a value: format(3.14,'f')
eval(str) Run string as Python code (careful!)
exec(str) Execute string as Python code
globals() Returns global variable dict
locals() Returns local variable dict
any(lst) True if at least one item is True
all(lst) True if ALL items are True
iter(x) Creates an iterator from x
next(it) Gets the next item from iterator
slice(a,b) Creates a slice object
object() Base class of all Python classes
super() Call parent class methods
staticmethod(fn) Decorator for static method
classmethod(fn) Decorator for class method
property(fn) Makes a method act like an attribute
13.2 math Module
The math module gives you advanced mathematical functions.
📝 Code:
import math
print([Link](144)) # 12.0 — square root
print([Link]) # 3.14159...
print([Link](4.1)) # 5 — round up
print([Link](4.9)) # 4 — round down
print([Link](5)) # 120
print([Link](100, 10)) # 2.0 — log base 10
print([Link](math.e)) # 1.0 — natural log
print([Link]([Link]/2)) # 1.0
print([Link](0)) # 1.0
print([Link](12, 18)) # 6
print([Link]) # infinity
print([Link](float('nan'))) # True
💻 Output:
12.0
3.141592653589793
5
4
120
2.0
1.0
1.0
1.0
6
inf
True
13.3 random Module
📝 Code:
import random
print([Link]()) # float between 0 and 1
print([Link](1, 100)) # random integer 1-100
print([Link](['a','b','c'])) # random item from list
print([Link]([1,2,3], k=5)) # 5 random picks (with replacement)
cards = ['A', 'K', 'Q', 'J', '10']
[Link](cards) # shuffle in place
print(cards)
print([Link](range(100), 5)) # 5 unique random numbers
print([Link](1.5, 10.5)) # random float in range
💻 Output:
0.7234... (random)
42 (random)
b (random)
[1,1,3,2,1] (random)
shuffled...
[23,67,4,89,31] (random)
6.23 (random)
13.4 datetime Module
📝 Code:
from datetime import datetime, date, timedelta
now = [Link]()
print(now) # current date and time
print([Link], [Link], [Link])
print([Link]('%d-%m-%Y')) # format: 05-06-2025
# Date arithmetic
today = [Link]()
future = today + timedelta(days=30)
print(f'30 days from now: {future}')
# Parse a date string
birthday = [Link]('15-08-2000', '%d-%m-%Y')
print(birthday)
💻 Output:
2025-06-05 14:23:11.123456
2025 6 5
05-06-2025
30 days from now: 2025-07-05
2000-08-15 00:00:00
13.5 os & sys Modules
📝 Code:
import os
import sys
print([Link]()) # current directory
print([Link]('.')) # list files in directory
[Link]('new_folder') # create a folder
[Link]('[Link]', '[Link]')# rename file
[Link]('[Link]') # delete file
print([Link]('[Link]')) # check if file exists
print([Link]('data', '[Link]')) # safe path joining
print([Link]) # Python version
print([Link]) # command line arguments
💻 Output:
/home/user/project
['[Link]', 'data', ...]
True
data/[Link] (or data\[Link] on Windows)
3.11.0 ...
13.6 collections Module
📝 Code:
from collections import Counter, defaultdict, OrderedDict, deque
# Counter — count occurrences
words = ['apple', 'mango', 'apple', 'banana', 'mango', 'apple']
c = Counter(words)
print(c) # Counter({'apple': 3, ...})
print(c.most_common(2)) # top 2 items
# defaultdict — no KeyError for missing keys
dd = defaultdict(list)
dd['fruits'].append('mango') # works without dd['fruits'] = []
print(dd)
# deque — fast append/remove from both ends
dq = deque([1, 2, 3])
[Link](0)
[Link](4)
[Link]()
print(dq)
💻 Output:
Counter({'apple': 3, 'mango': 2, 'banana': 1})
[('apple', 3), ('mango', 2)]
defaultdict(<class 'list'>, {'fruits': ['mango']})
deque([1, 2, 3, 4])
13.7 itertools Module
📝 Code:
import itertools
# chain — combine iterables
print(list([Link]([1,2], [3,4], [5])))
# combinations
print(list([Link]('ABC', 2)))
# permutations
print(list([Link]([1,2,3], 2)))
# product — cartesian product
print(list([Link]([1,2], ['a','b'])))
# groupby
data = [('A',1),('A',2),('B',3),('B',4)]
for key, group in [Link](data, key=lambda x: x[0]):
print(key, list(group))
💻 Output:
[1, 2, 3, 4, 5]
[('A','B'),('A','C'),('B','C')]
[(1,2),(1,3),(2,1),(2,3),(3,1),(3,2)]
[(1,'a'),(1,'b'),(2,'a'),(2,'b')]
A [('A',1),('A',2)]
B [('B',3),('B',4)]
13.8 functools Module
📝 Code:
from functools import reduce, partial, lru_cache
# reduce — apply function cumulatively
nums = [1, 2, 3, 4, 5]
product = reduce(lambda a, b: a * b, nums)
print(product) # 120 (1*2*3*4*5)
# partial — fix some arguments of a function
def multiply(x, y):
return x * y
double = partial(multiply, 2) # x is fixed as 2
print(double(5)) # 10
# lru_cache — cache results to speed up repetitive calls
@lru_cache(maxsize=None)
def fibonacci(n):
if n < 2: return n
return fibonacci(n-1) + fibonacci(n-2)
print(fibonacci(40)) # fast, cached!
💻 Output:
120
10
102334155
SECTION 14: NumPy — Numbers at Speed
NumPy = Numerical Python. It is the foundation of ALL data science in Python.
It works with arrays (like lists but MUCH faster for math). Instead of looping, NumPy does
math on the whole array at once. Used in every ML library: pandas, scikit-learn, TensorFlow.
14.1 Creating Arrays
📝 Code:
import numpy as np
# From list
a = [Link]([1, 2, 3, 4, 5])
print(a) # [1 2 3 4 5]
print([Link]) # (5,) — shape
print([Link]) # int64
# Special arrays
print([Link]((2, 3))) # 2x3 of zeros
print([Link]((2, 3))) # 2x3 of ones
print([Link](3)) # 3x3 identity matrix
print([Link](0, 10, 2)) # [0 2 4 6 8]
print([Link](0,1,5)) # 5 evenly spaced from 0 to 1
print([Link](3, 2)) # 3x2 random floats 0-1
print([Link]((2, 2), 7)) # 2x2 filled with 7
💻 Output:
[1 2 3 4 5]
(5,)
int64
[[0. 0. 0.] [0. 0. 0.]]
[[1. 1. 1.] [1. 1. 1.]]
...
[0 2 4 6 8]
[0. 0.25 0.5 0.75 1. ]
...
[[7 7] [7 7]]
14.2 Array Operations (Vectorized — No Loop Needed!)
📝 Code:
a = [Link]([1, 2, 3, 4, 5])
b = [Link]([10, 20, 30, 40, 50])
print(a + b) # [11 22 33 44 55]
print(a * 2) # [ 2 4 6 8 10]
print(b / a) # [10. 10. 10. 10. 10.]
print(a ** 2) # [ 1 4 9 16 25]
print([Link](a)) # [1. 1.41 1.73 2. 2.24]
# Boolean operations
print(a > 2) # [False False True True True]
print(a[a > 2]) # [3 4 5] — filter!
💻 Output:
[11 22 33 44 55]
[ 2 4 6 8 10]
[10. 10. 10. 10. 10.]
[ 1 4 9 16 25]
[1. 1.414 1.732 2. 2.236]
[False False True True True]
[3 4 5]
14.3 2D Arrays (Matrices)
📝 Code:
m = [Link]([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
print([Link]) # (3, 3)
print(m[0]) # [1 2 3] — first row
print(m[:, 1]) # [2 5 8] — second column
print(m[1, 2]) # 6 — row 1, col 2
print(m[0:2, 0:2]) # top-left 2x2 submatrix
# Matrix math
print(m.T) # transpose
print([Link](1, 9)) # reshape to 1x9
print([Link](m, m)) # matrix multiplication
💻 Output:
(3, 3)
[1 2 3]
[2 5 8]
6
[[1 2] [4 5]]
transpose...
[[1 2 3 4 5 6 7 8 9]]
matrix product...
14.4 NumPy Statistics
📝 Code:
data = [Link]([15, 22, 8, 35, 42, 18, 27, 31])
print([Link](data)) # average
print([Link](data)) # middle value
print([Link](data)) # standard deviation
print([Link](data)) # variance
print([Link](data))
print([Link](data))
print([Link](data))
print([Link](data)) # index of min
print([Link](data)) # index of max
print([Link](data, 75)) # 75th percentile
print([Link](data)) # cumulative sum
💻 Output:
24.75
24.5
10.81...
116.94...
8
42
198
2
4
31.75
[ 15 37 45 80 122 140 167 198]
14.5 Useful NumPy Functions for Data Science
Function What it does
[Link](a) Unique values
[Link](a) Sorted array
[Link](a) Indices that would sort the array
[Link](a, lo, hi) Cap values between lo and hi
[Link](cond, x, y) Like if-else for arrays
[Link]([a,b]) Join arrays
[Link]([a,b]) Stack arrays vertically (row-wise)
[Link]([a,b]) Stack arrays horizontally (col-wise)
[Link](a, n) Split into n equal parts
[Link]() Convert any-D to 1D
[Link](a) Flatten without copy
[Link] Represents missing/Not a Number
[Link](a) True where values are NaN
[Link](a) Mean ignoring NaN values
[Link](a) Natural logarithm
[Link](a) e raised to power
[Link](a) Absolute values
[Link](a, n) Round to n decimals
[Link](n) Set random seed (reproducible)
[Link](r,c) Standard normal distribution
[Link](a,b,n) n random integers from a to b
[Link](a) Magnitude of vector
[Link](m) Inverse of matrix
[Link](m) Determinant
[Link](m) Eigenvalues & eigenvectors
SECTION 15: Pandas — Working with Tables of Data
Pandas is the #1 tool for data analysis in Python. It works like a super-powered Excel in code.
The two main structures: Series (one column) and DataFrame (whole table with rows &
columns).
Used in EVERY data science workflow: cleaning data, exploring data, preparing for machine
learning.
15.1 Series — One Column of Data
📝 Code:
import pandas as pd
# Create a Series
marks = [Link]([85, 92, 78, 95, 88],
index=['Ravi','Priya','Arjun','Sita','Ram'])
print(marks)
print('Mean:', [Link]())
print('Ravi:', marks['Ravi'])
print('Top scorers:')
print(marks[marks > 88])
💻 Output:
Ravi 85
Priya 92
Arjun 78
Sita 95
Ram 88
dtype: int64
Mean: 87.6
Ravi: 85
Top scorers:
Priya 92
Sita 95
15.2 DataFrame — The Full Table
📝 Code:
data = {
'Name': ['Ravi', 'Priya', 'Arjun', 'Sita', 'Ram'],
'Age': [16, 15, 17, 16, 15],
'Score': [85, 92, 78, 95, 88],
'Grade': ['B', 'A', 'C', 'A+', 'A']
}
df = [Link](data)
print(df)
💻 Output:
Name Age Score Grade
0 Ravi 16 85 B
1 Priya 15 92 A
2 Arjun 17 78 C
3 Sita 16 95 A+
4 Ram 15 88 A
15.3 Exploring a DataFrame
📝 Code:
print([Link]) # (5, 4) — 5 rows, 4 columns
print([Link]) # data types of each column
print([Link](3)) # first 3 rows
print([Link](2)) # last 2 rows
print([Link]()) # summary info
print([Link]()) # stats for numeric columns
print([Link]) # column names
print([Link]) # row indices
print([Link]().sum()) # count missing values per column
print([Link]()) # unique values per column
print([Link](2)) # 2 random rows
💻 Output:
(5, 4)
Name: object, Age: int64 ...
first 3 rows...
last 2 rows...
info summary...
count mean std min 25% ...
Index(['Name','Age','Score','Grade'])
RangeIndex(start=0, stop=5)
0 missing values ...
15.4 Selecting Data
📝 Code:
# Select a column
print(df['Name'])
print(df[['Name', 'Score']]) # multiple columns
# Select rows by index number — iloc
print([Link][0]) # first row
print([Link][1:3]) # rows 1 and 2
print([Link][0, 2]) # row 0, column 2
# Select rows by label — loc
print([Link][0, 'Name']) # row 0, Name column
print([Link][0:2, ['Name','Score']]) # slice rows, specific cols
# Filtering rows
print(df[df['Score'] > 85]) # students with score > 85
print(df[df['Age'] == 16]) # age 16 only
print(df[(df['Score'] > 80) & (df['Age'] == 15)]) # multiple conditions
💻 Output:
0 Ravi ...
(Name, Score columns)
Name Ravi Age 16 ...
rows 1-2...
78
Ravi
(rows 0-2, Name+Score)
rows with score > 85...
rows where age=16...
filtered rows...
15.5 Modifying DataFrames
📝 Code:
# Add new column
df['Passed'] = df['Score'] >= 80
df['Score_x2'] = df['Score'] * 2
# Update values
[Link][2, 'Score'] = 82 # fix Arjun's score
# Rename columns
[Link](columns={'Score': 'Marks'}, inplace=True)
# Drop column
[Link](columns=['Score_x2'], inplace=True)
# Drop row
[Link](index=4, inplace=True)
# Reset index
df.reset_index(drop=True, inplace=True)
print(df)
💻 Output:
updated DataFrame printed...
15.6 Handling Missing Data
📝 Code:
import numpy as np
df2 = [Link]({
'Name': ['Ravi', 'Priya', None, 'Sita'],
'Score': [85, None, 78, 95],
'City': ['Hyd', 'Hyd', None, 'Pune']
})
print([Link]()) # True where value is missing
print([Link]().sum()) # count missing per column
# Fill missing values
df2['Score'].fillna(df2['Score'].mean(), inplace=True) # fill with mean
df2['City'].fillna('Unknown', inplace=True) # fill with text
# Drop rows with ANY missing value
df3 = [Link]()
# Drop rows where ALL values are missing
df4 = [Link](how='all')
print(df2)
💻 Output:
NaN → filled or dropped...
15.7 Groupby & Aggregation
GroupBy is like putting students in groups by grade and then calculating stats for each group.
📝 Code:
data = {
'City': ['Hyd','Hyd','Pune','Pune','Delhi'],
'Sales': [1000, 1500, 800, 1200, 900],
'Year': [2023, 2024, 2023, 2024, 2023]
}
df = [Link](data)
# Total sales by city
print([Link]('City')['Sales'].sum())
# Multiple aggregations
print([Link]('City')['Sales'].agg(['mean','sum','min','max']))
# Group by multiple columns
print([Link](['City','Year'])['Sales'].sum())
💻 Output:
City
Delhi 900
Hyd 2500
Pune 2000
(mean, sum, min, max per city)
(city+year grouped sums)
15.8 Sorting & Ranking
📝 Code:
df = [Link]({'Name':['C','A','B'],'Score':[85,92,78]})
# Sort by column
print(df.sort_values('Score')) # ascending
print(df.sort_values('Score', ascending=False)) # descending
print(df.sort_values(['Name','Score'])) # multiple columns
# Rank
df['Rank'] = df['Score'].rank(ascending=False).astype(int)
print(df)
💻 Output:
sorted by score...
descending...
multi-sorted...
with rank column...
15.9 Merging & Joining DataFrames
📝 Code:
students = [Link]({'ID':[1,2,3], 'Name':['Ravi','Priya','Arjun']})
scores = [Link]({'ID':[1,2,4], 'Score':[85,92,78]})
# Inner join — only matching IDs
print([Link](students, scores, on='ID', how='inner'))
# Left join — all students, NaN where no score
print([Link](students, scores, on='ID', how='left'))
# Outer join — everything
print([Link](students, scores, on='ID', how='outer'))
# Concat — stack DataFrames
df1 = [Link]({'A':[1,2],'B':[3,4]})
df2 = [Link]({'A':[5,6],'B':[7,8]})
print([Link]([df1, df2], ignore_index=True))
💻 Output:
inner...
left...
outer...
concatenated...
15.10 Apply & Map — Custom Transformations
📝 Code:
df = [Link]({'Name':['Ravi','Priya'],'Score':[85,92]})
# apply — run function on each item
df['Grade'] = df['Score'].apply(lambda x: 'A' if x >= 90 else 'B')
print(df)
# map — replace values using dict
grade_map = {'A': 'Excellent', 'B': 'Good'}
df['Level'] = df['Grade'].map(grade_map)
print(df)
# applymap/map — apply to entire DataFrame
nums = [Link]({'a':[1,2],'b':[3,4]})
print([Link](lambda x: x*2))
💻 Output:
with Grade column...
with Level column...
doubled values...
15.11 Reading & Writing Files
📝 Code:
# CSV
df = pd.read_csv('[Link]')
df = pd.read_csv('[Link]', index_col=0, parse_dates=['date'])
df.to_csv('[Link]', index=False)
# Excel
df = pd.read_excel('[Link]', sheet_name='Sheet1')
df.to_excel('[Link]', index=False)
# JSON
df = pd.read_json('[Link]')
df.to_json('[Link]', orient='records')
# Useful read_csv options
df = pd.read_csv('[Link]',
sep=',', # separator
header=0, # row to use as header
usecols=['A','B'], # load only these columns
nrows=100, # load only first 100 rows
na_values=['NA','?'] # treat these as NaN
)
💻 Output:
DataFrame loaded...
15.12 All Important Pandas Functions Reference
Function/Method What it does
pd.read_csv() Load CSV file into DataFrame
pd.read_excel() Load Excel file
df.to_csv() Save to CSV
[Link](n) First n rows (default 5)
[Link](n) Last n rows
[Link]() Column types and non-null counts
[Link]() Statistics for numeric cols
[Link] Rows x Columns
[Link] Data type of each column
[Link] Column names
[Link] Row indices
[Link]() True/False for missing values
[Link]() Remove rows with missing values
[Link](v) Fill missing with value v
[Link]() Mark duplicate rows
df.drop_duplicates() Remove duplicate rows
[Link]() Rename columns or index
[Link]() Convert column to type
df.sort_values(col) Sort by column
[Link](col) Group data by column
[Link](fn) Aggregate with function
df.pivot_table() Create pivot table
[Link]() Wide to long format
[Link](df2) SQL-style join
[Link]([a,b]) Stack DataFrames
[Link](fn) Apply function to col/row
[Link](fn) Apply function elementwise
df.value_counts() Count unique values
[Link]() Correlation matrix
[Link]() Covariance matrix
[Link](n) Random n rows
df.reset_index() Reset to integer index
df.set_index(col) Set column as index
[Link]() Deep copy of DataFrame
df.T Transpose (swap rows/cols)
[Link]('col>5') Filter using string query
[Link][row, col] Access by label
[Link][row, col] Access by integer index
[Link][row, col] Single value by label
[Link][row, col] Single value by integer
[Link](lo, hi) Limit values to range
[Link](n) Round to n decimal places
[Link]() Column sums
[Link]() Column means
[Link]() Standard deviation
[Link]() Variance
[Link]()/max() Min/max of each column
[Link]() Cumulative sum
[Link]() Difference between rows
[Link](n) Shift values by n rows
df.pct_change() Percentage change from previous
[Link](n) Rolling window operations
[Link]() Expanding window operations
[Link]() String matching in column
[Link]() Split string column
[Link]() Strip whitespace
[Link]()/upper() Case conversion
pd.get_dummies(df) One-hot encoding
[Link]() Rank values in column
SECTION 16: Matplotlib & Seaborn — Data Visualization
16.1 Matplotlib Basics
Matplotlib is the base plotting library. Think of it as the pencil and paper for drawing graphs.
📝 Code:
import [Link] as plt
import numpy as np
x = [1, 2, 3, 4, 5]
y = [10, 25, 15, 30, 20]
[Link](figsize=(8, 5)) # set size
[Link](x, y, color='blue', marker='o', linewidth=2, label='Sales')
[Link]('Monthly Sales')
[Link]('Month')
[Link]('Sales (₹ thousands)')
[Link]()
[Link](True)
[Link]('[Link]', dpi=150) # save to file
[Link]()
💻 Output:
[chart saved as [Link]]
16.2 Types of Charts
Chart Type Code + When to use
Line chart [Link](x,y) — trends over time
Bar chart [Link](x,y) — comparing categories
Horizontal bar [Link](x,y) — long category names
Histogram [Link](data,bins=20) — distribution
Scatter plot [Link](x,y) — relationship between 2 vars
Pie chart [Link](sizes, labels=...) — parts of whole
Box plot [Link](data) — spread & outliers
Multiple plots [Link](1,2,1) then [Link](1,2,2)
Heatmap (seaborn) [Link](data, annot=True) — matrix viz
Pair plot [Link](df) — all variable pairs
Distribution [Link](data, kde=True)
16.3 Seaborn — Beautiful Plots Easily
📝 Code:
import seaborn as sns
import [Link] as plt
import pandas as pd
# Load sample dataset
tips = sns.load_dataset('tips')
# Distribution plot
[Link](tips['total_bill'], kde=True)
[Link]()
# Scatter with color grouping
[Link](data=tips, x='total_bill', y='tip', hue='sex')
[Link]()
# Box plot
[Link](data=tips, x='day', y='total_bill', hue='sex')
[Link]()
# Correlation heatmap
corr = [Link](numeric_only=True)
[Link](corr, annot=True, cmap='coolwarm', fmt='.2f')
[Link]()
💻 Output:
[charts displayed]
SECTION 17: Scikit-Learn — Machine Learning Basics
Scikit-learn is THE machine learning library for Python. It has ready-made algorithms for:
• Classification (predict which category: spam/not spam, disease/healthy)
• Regression (predict a number: house price, salary)
• Clustering (group similar data: customer segments)
• Everything follows the same pattern: Import → Create → Fit → Predict → Evaluate
17.1 The Standard ML Workflow
📝 Code:
from sklearn.model_selection import train_test_split
from [Link] import StandardScaler
from sklearn.linear_model import LogisticRegression
from [Link] import accuracy_score, classification_report
# Step 1: Prepare data
# X = features (input), y = target (output)
X = df[['feature1', 'feature2', 'feature3']]
y = df['target']
# Step 2: Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
) # 80% train, 20% test
# Step 3: Scale features (important for many algorithms)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = [Link](X_test)
# Step 4: Create and train model
model = LogisticRegression()
[Link](X_train, y_train)
# Step 5: Predict
y_pred = [Link](X_test)
# Step 6: Evaluate
print('Accuracy:', accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))
💻 Output:
Accuracy: 0.95...
classification report...
17.2 Preprocessing Functions
Function What it does
StandardScaler Scale to mean=0, std=1 (z-score)
MinMaxScaler Scale to 0-1 range
RobustScaler Scales using median (robust to outliers)
LabelEncoder Convert text labels to numbers: cat→0, dog→1
OneHotEncoder Convert categories to binary columns
pd.get_dummies() Same as OneHotEncoder but simpler
SimpleImputer Fill missing values (mean/median/constant)
PolynomialFeatures Add polynomial features (x², x*y)
train_test_split Split data into training and test sets
KFold K-fold cross validation splits
cross_val_score Run cross validation and return scores
17.3 Common ML Algorithms
Algorithm When to use / Import
LinearRegression Predict continuous number. sklearn.linear_model
LogisticRegression Binary classification (yes/no).
sklearn.linear_model
DecisionTreeClassifier Easy to understand tree-based decisions
RandomForestClassifier Many trees — very accurate. [Link]
GradientBoostingClassifier Boosted trees — top accuracy. [Link]
KNeighborsClassifier Classify by nearest neighbors
SVC Support Vector Machine for classification
KMeans Unsupervised clustering. [Link]
PCA Reduce dimensions. [Link]
Ridge / Lasso Regularized regression (prevent overfitting)
XGBClassifier Extreme Gradient Boosting (xgboost library)
NaiveBayes Fast text classification. sklearn.naive_bayes
17.4 Evaluation Metrics
Metric Use case + Function
accuracy_score % of correct predictions
confusion_matrix Table of TP, TN, FP, FN
classification_report Precision, Recall, F1 per class
roc_auc_score Area under ROC curve (binary classification)
mean_squared_error For regression: average squared error
mean_absolute_error For regression: average absolute error
r2_score Regression: how much variance explained
silhouette_score For clustering quality
cross_val_score Evaluate with cross-validation
SECTION 18: Advanced Python Concepts
18.1 Generators — Lazy Data Processing
A generator is like a list, but it creates items ONE AT A TIME instead of all at once. This saves memory
when working with huge datasets.
📝 Code:
# Generator function — uses 'yield' instead of 'return'
def count_up(n):
for i in range(n):
yield i # pause here, give value, resume later
gen = count_up(5)
print(next(gen)) # 0
print(next(gen)) # 1
for val in count_up(5):
print(val, end=' ')
print()
# Generator expression (like list comprehension but lazy)
squares = (x**2 for x in range(1_000_000)) # barely uses memory!
print(next(squares)) # 0
print(next(squares)) # 1
💻 Output:
0
1
0 1 2 3 4
0
1
18.2 Decorators — Wrapping Functions
A decorator adds extra behaviour to a function without changing its code. Like wrapping a gift — the gift
is the same, but it has a wrapper.
📝 Code:
import time
def timer(func): # decorator function
def wrapper(*args, **kwargs):
start = [Link]()
result = func(*args, **kwargs)
end = [Link]()
print(f'{func.__name__} took {end-start:.4f}s')
return result
return wrapper
@timer # apply decorator
def slow_function():
[Link](0.1)
return 'done'
result = slow_function() # automatically timed!
print(result)
💻 Output:
slow_function took 0.1003s
done
18.3 Context Managers — with Statement
Context managers make sure resources (files, database connections) are properly cleaned up, even if
an error occurs.
📝 Code:
# Custom context manager using class
class Timer:
def __enter__(self):
import time
[Link] = [Link]()
return self
def __exit__(self, *args):
import time
elapsed = [Link]() - [Link]
print(f'Time: {elapsed:.4f}s')
with Timer():
total = sum(range(1_000_000))
print(total)
💻 Output:
499999500000
Time: 0.0341s
18.4 Regular Expressions
Regular expressions (regex) are patterns for searching and manipulating text. Very useful for data
cleaning.
📝 Code:
import re
text = 'Call us: 9876543210 or 040-23456789 for info@[Link]'
# Find phone numbers
phones = [Link](r'\d{10}', text)
print(phones)
# Check if email is valid
email = 'test@[Link]'
pattern = r'^[\w.-]+@[\w.-]+\.\w+$'
print(bool([Link](pattern, email)))
# Replace
cleaned = [Link](r'\s+', ' ', 'too many spaces')
print(cleaned)
# Split by multiple delimiters
parts = [Link](r'[,;|]', 'a,b;c|d')
print(parts)
💻 Output:
['9876543210']
True
too many spaces
['a', 'b', 'c', 'd']
18.5 map(), filter(), zip() in Depth
📝 Code:
nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# map — transform each element
squared = list(map(lambda x: x**2, nums))
print(squared)
# filter — keep only matching elements
evens = list(filter(lambda x: x % 2 == 0, nums))
print(evens)
# Combine map and filter
even_squares = list(map(lambda x: x**2, filter(lambda x: x%2==0, nums)))
print(even_squares)
# zip — combine multiple lists
names = ['Ravi', 'Priya', 'Arjun']
scores = [85, 92, 78]
cities = ['Hyd', 'Pune', 'Delhi']
combined = list(zip(names, scores, cities))
print(combined)
💻 Output:
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
[2, 4, 6, 8, 10]
[4, 16, 36, 64, 100]
[('Ravi', 85, 'Hyd'), ('Priya', 92, 'Pune'), ('Arjun', 78, 'Delhi')]
18.6 Useful Python Tricks for Data Science
📝 Code:
# Unpacking
a, *middle, z = [1, 2, 3, 4, 5]
print(a, middle, z) # 1 [2, 3, 4] 5
# Walrus operator := (Python 3.8+)
data = [1, 2, 3, 4, 5]
if (n := len(data)) > 3:
print(f'Long list with {n} items')
# Dictionary unpacking
d1 = {'a': 1, 'b': 2}
d2 = {'c': 3}
merged = {**d1, **d2} # merge dicts
print(merged)
# Conditional expression in list
data = [1, -2, 3, -4, 5]
positive = [x if x > 0 else 0 for x in data]
print(positive)
💻 Output:
1 [2, 3, 4] 5
Long list with 5 items
{'a': 1, 'b': 2, 'c': 3}
[1, 0, 3, 0, 5]
SECTION 19: Quick Reference Cheatsheet
String Formatting Specifiers
Specifier Example
{:.2f} f'{3.14159:.2f}' → '3.14' (2 decimal places)
{:d} f'{42:d}' → '42' (integer)
{:05d} f'{42:05d}' → '00042' (zero padded)
{:>10} f'{'hi':>10}' → ' hi' (right align)
{:<10} f'{'hi':<10}' → 'hi ' (left align)
{:^10} f'{'hi':^10}' → ' hi ' (center)
{:,} f'{1234567:,}' → '1,234,567' (commas)
{:.2%} f'{0.856:.2%}' → '85.60%'
{:e} f'{1234567:.2e}' → '1.23e+06' (scientific)
{:b} f'{10:b}' → '1010' (binary)
{:x} f'{255:x}' → 'ff' (hexadecimal)
Common Pandas One-Liners for Data Science
📝 Code:
df['col'].value_counts() # count unique values
df['col'].value_counts(normalize=True) # as percentages
df.select_dtypes(include='number') # numeric columns only
df.select_dtypes(include='object') # text columns only
[Link]()['target'].sort_values() # correlation with target
[Link](5, 'col') # top 5 rows by column
[Link](5, 'col') # bottom 5 rows
[Link]('keyword') # string search
[Link]([Link], bins=5) # bin into 5 groups
[Link]([Link], q=4) # quartile bins
df.to_dict('records') # DataFrame to list of dicts
[Link](list_of_dicts) # list of dicts to DataFrame
💻 Output:
useful outputs...
The Complete Python Data Science Import Block
# Data manipulation
import numpy as np
import pandas as pd
# Visualization
import [Link] as plt
import seaborn as sns
%matplotlib inline # for Jupyter notebooks
# Machine Learning
from sklearn.model_selection import train_test_split, cross_val_score
from [Link] import StandardScaler, LabelEncoder,
OneHotEncoder
from [Link] import SimpleImputer
from sklearn.linear_model import LinearRegression, LogisticRegression,
Ridge, Lasso
from [Link] import DecisionTreeClassifier
from [Link] import RandomForestClassifier,
GradientBoostingClassifier
from [Link] import SVC
from [Link] import KNeighborsClassifier
from [Link] import KMeans
from [Link] import PCA
from [Link] import accuracy_score, confusion_matrix,
classification_report
from [Link] import mean_squared_error, r2_score
# Utilities
import os, sys, re
from datetime import datetime
from collections import Counter, defaultdict
import warnings
[Link]('ignore')
# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
np.set_printoptions(precision=4, suppress=True)
🎉 You now have a complete Python sprint for Data Science & AI!
Variables → Strings → Loops → Functions → OOP → NumPy → Pandas → ML