Python Comprehensive Notes — Harvard CS 50P Page
PYTHON
COMPREHENSIVE LECTURE NOTES
From Fundamentals to Advanced Mastery
Department of Computer Science
Harvard University
CS 50P — Complete Python Mastery
Covering 15 Modules
900+ Pages of Detailed Content • 300+ Code Examples • Real-World Applications
Python Comprehensive Notes — Harvard CS 50P Page
MODULE 1
Introduction to Python
1. Introduction to Python
1.1 History and Philosophy
Python was created by Guido van Rossum, first released in 1991. Named after Monty Python's Flying Circus,
not the snake. Van Rossum designed Python with the philosophy that code should be readable, beautiful, and
explicit rather than implicit. This philosophy is codified in PEP 20 — "The Zen of Python."
The Zen of Python (PEP 20)
Tim Peters distilled the guiding principles of Python's design. To read it, type import this in any Python
interpreter:
import this
# Beautiful is better than ugly.
# Explicit is better than implicit.
# Simple is better than complex.
# Complex is better than complicated.
# Flat is better than nested.
# Sparse is better than dense.
# Readability counts.
# Special cases aren't special enough to break the rules.
# Although practicality beats purity.
# Errors should never pass silently.
# Unless explicitly silenced.
# In the face of ambiguity, refuse the temptation to guess.
# There should be one-- and preferably only one --obvious way to do it.
# Now is better than never.
1.2 Python Versions
Python 2 reached end-of-life on January 1, 2020. All modern development uses Python 3. Key differences
include print() as a function (not statement), integer division (//), unicode strings by default, and range()
returning an iterator. Always use Python 3.10+ for new projects.
# Check Python version
import sys
print([Link]) # e.g., 3.12.0
print(sys.version_info) # sys.version_info(major=3, minor=12, ...)
print(sys.version_info >= (3, 10)) # True
1.3 Installation and Setup
Installing Python
• Download from [Link] (official) or use pyenv for version management
• On macOS: brew install python3
• On Ubuntu/Debian: sudo apt-get install python3
Python Comprehensive Notes — Harvard CS 50P Page
• Verify: python3 --version
Virtual Environments (CRITICAL PRACTICE)
A virtual environment is an isolated Python environment allowing separate dependencies per project. Always
use virtual environments — never install packages globally.
# Create a virtual environment
python3 -m venv myenv
# Activate (macOS/Linux)
source myenv/bin/activate
# Activate (Windows)
myenv\Scripts\[Link]
# Install a package inside the venv
pip install requests
# Freeze requirements
pip freeze > [Link]
# Install from requirements
pip install -r [Link]
# Deactivate
deactivate
💡 TIP: Use pyenv + pyenv-virtualenv for managing multiple Python versions alongside multiple virtual
environments.
1.4 Python Execution Model
Python is an interpreted, dynamically-typed language. When you run a .py file, CPython (the reference
implementation) compiles it to bytecode (.pyc), then the Python Virtual Machine (PVM) executes that
bytecode. This happens automatically and transparently.
# [Link]
print("Hello, Harvard!")
# Run it:
# $ python3 [Link]
# Hello, Harvard!
# Python also compiles .pyc to __pycache__/
# These are cached bytecode files — safe to ignore/delete
📘 NOTE: Python also supports interactive mode (REPL — Read-Eval-Print Loop) via python3. Use ipython for
an enhanced interactive experience with syntax highlighting and auto-completion.
Python Comprehensive Notes — Harvard CS 50P Page
MODULE 2
Variables, Data Types & Operators
2. Variables, Data Types & Operators
2.1 Variables and Assignment
Python variables are dynamic references to objects. A variable is not a box containing a value — it is a label
pointing to an object in memory. Multiple variables can point to the same object.
# Simple assignment
x = 42
name = "Alice"
pi = 3.14159
# Multiple assignment
a = b = c = 0 # All point to same object 0
# Tuple unpacking (very Pythonic)
x, y, z = 1, 2, 3
first, *rest = [1, 2, 3, 4, 5] # first=1, rest=[2,3,4,5]
a, _, b = (10, 20, 30) # _ is convention for "don't care"
# Swap without temp variable (Python idiom)
x, y = y, x
# Augmented assignment
x += 5 # x = x + 5
x -= 2 # x = x - 2
x *= 3 # x = x * 3
x //= 2 # x = x // 2 (floor division)
x **= 2 # x = x ** 2 (power)
📘 NOTE: Python uses dynamic typing — you don't declare types. The variable's type is determined at
runtime by the object it references. Use type() or isinstance() to check types.
2.2 Built-in Data Types
Numeric Types
Python has three distinct numeric types: integers (int), floating-point numbers (float), and complex numbers
(complex). Python integers have arbitrary precision — they can be as large as memory allows.
# int — arbitrary precision integers
x = 42
big = 99999999999999999999999999999 # No overflow!
binary = 0b1010 # Binary literal = 10
octal = 0o17 # Octal literal = 15
hexa = 0xFF # Hex literal = 255
# float — 64-bit IEEE 754 double precision
pi = 3.14159265358979
sci = 1.5e-10 # Scientific notation
inf = float('inf') # Positive infinity
nan = float('nan') # Not a Number
Python Comprehensive Notes — Harvard CS 50P Page
# complex — real + imaginary
z = 3 + 4j
print([Link]) # 3.0
print([Link]) # 4.0
print(abs(z)) # 5.0 (magnitude = sqrt(3²+4²))
# Type conversions
int(3.9) # 3 (truncates, does NOT round)
float(7) # 7.0
round(3.567, 2) # 3.57
⚠️WARNING: Floating-point arithmetic is not exact due to IEEE 754 binary representation. 0.1 + 0.2 != 0.3
in Python. Use the decimal module for exact decimal arithmetic (finance, science).
from decimal import Decimal, getcontext
getcontext().prec = 50 # 50 significant digits
a = Decimal("0.1")
b = Decimal("0.2")
print(a + b) # 0.3 (exact!)
# Also: fractions for exact rational arithmetic
from fractions import Fraction
f = Fraction(1, 3) # Exactly 1/3
print(f + Fraction(1, 6)) # 1/2
Strings (str)
Strings in Python 3 are immutable sequences of Unicode characters (UTF-8). They support a rich API for text
manipulation.
# String literals — 4 ways
s1 = 'single quotes'
s2 = "double quotes"
s3 = '''triple single — spans
multiple lines'''
s4 = """triple double — same idea"""
# Raw strings (backslash not treated as escape)
path = r"C:\Users\Alice\Documents" # Useful for regex, Windows paths
# f-strings (Python 3.6+) — preferred for formatting
name = "Alice"
gpa = 3.9876
msg = f"Student {name} has GPA {gpa:.2f}" # "Student Alice has GPA 3.99"
# f-string debug mode (Python 3.8+)
x = 42
print(f"{x=}") # x=42 (prints name AND value)
# Old-style formatting (still common in legacy code)
"%s has %.2f" % (name, gpa)
# [Link]()
"{} has {:.2f}".format(name, gpa)
# String operations
s = "Hello, World!"
len(s) # 13
[Link]() # "HELLO, WORLD!"
[Link]() # "hello, world!"
[Link]() # removes leading/trailing whitespace
[Link]("H") # "ello, World!"
Python Comprehensive Notes — Harvard CS 50P Page
[Link]("World", "Python") # "Hello, Python!"
[Link](", ") # ["Hello", "World!"]
", ".join(["a","b","c"]) # "a, b, c"
[Link]("He") # True
[Link]("!") # True
[Link]("World") # 7 (index, -1 if not found)
[Link]("l") # 3
[Link]() # False
" ".isspace() # True
String Slicing — Deep Dive
String slicing uses the syntax s[start:stop:step]. Start is inclusive, stop is exclusive. Negative indices count from
the end. This is one of Python's most powerful features.
s = "Python Programming"
# 0123456789...
s[0] # 'P'
s[-1] # 'g' (last character)
s[0:6] # 'Python'
s[7:] # 'Programming'
s[:6] # 'Python'
s[::2] # every 2nd char: 'Pto rgamn'
s[::-1] # reverse: 'gnimmargorP nohtyP'
s[7:11] # 'Prog'
# Slicing never raises IndexError — it clips silently
s[100:] # '' (empty string, no error)
Booleans (bool)
bool is a subclass of int. True equals 1 and False equals 0. This means True + True == 2, which is valid but
usually a code smell.
True == 1 # True
False == 0 # True
bool(0) # False
bool(1) # True
bool("") # False (empty string is falsy)
bool("hi") # True
bool([]) # False (empty list is falsy)
bool([0]) # True (list with one element, even 0)
bool(None) # False
# Falsy values in Python:
# False, 0, 0.0, 0j, "", [], (), {}, set(), None, range(0)
# Everything else is truthy
None
None is Python's null value. It is the sole instance of NoneType. Used to represent absence of a value,
uninitialized variables, or default function returns. Always compare with 'is None', not '== None'.
x = None
print(x is None) # True (CORRECT)
print(x == None) # True (works but discouraged)
# Functions return None implicitly
def greet(name):
print(f"Hello, {name}")
# No explicit return — returns None
Python Comprehensive Notes — Harvard CS 50P Page
2.3 Operators
Arithmetic Operators
10 + 3 # 13 (addition)
10 - 3 # 7 (subtraction)
10 * 3 # 30 (multiplication)
10 / 3 # 3.333... (true division — always float)
10 // 3 # 3 (floor division — rounds toward -infinity)
10 % 3 # 1 (modulo — remainder)
10 ** 3 # 1000 (exponentiation)
# Floor division with negatives (important!)
-7 // 2 # -4 (rounds DOWN, not toward zero)
-7 % 2 # 1 (consistent with floor division)
Comparison and Logical Operators
# Comparison (return bool)
5 == 5 # True (equality)
5 != 4 # True (not equal)
5 > 4 # True
5 >= 5 # True
5 < 6 # True
# IMPORTANT: Chained comparisons (unique to Python)
0 < x < 10 # True if x is between 0 and 10, exclusive
1 <= age <= 120
# Identity vs Equality
a = [1, 2, 3]
b = [1, 2, 3]
a == b # True (equal values)
a is b # False (different objects)
a is not b # True
# Logical operators (short-circuit)
True and False # False (evaluates right only if left is True)
True or False # True (evaluates right only if left is False)
not True # False
# Short-circuit evaluation with side effects
def expensive(): print("called!"); return True
False and expensive() # "called!" is NOT printed
True or expensive() # "called!" is NOT printed
Bitwise Operators
a = 0b1100 # 12
b = 0b1010 # 10
a & b # 0b1000 = 8 (AND)
a | b # 0b1110 = 14 (OR)
a ^ b # 0b0110 = 6 (XOR)
~a # -13 (NOT — inverts all bits)
a << 2 # 0b110000=48 (left shift by 2)
a >> 1 # 0b0110 = 6 (right shift by 1)
# Practical use: checking if a number is even/odd
n & 1 == 0 # True if even (fast alternative to n % 2)
Python Comprehensive Notes — Harvard CS 50P Page
MODULE 3
Data Structures
3. Core Data Structures
3.1 Lists
A list is a mutable, ordered sequence of objects. It can contain elements of mixed types. Lists are backed by a
dynamic array and support O(1) indexing, O(1) amortized append, but O(n) insert/delete in the middle.
# Creation
empty = []
nums = [1, 2, 3, 4, 5]
mixed = [1, "hello", 3.14, True, None]
nested = [[1,2],[3,4],[5,6]]
# Indexing and slicing (same as strings)
nums[0] # 1
nums[-1] # 5
nums[1:4] # [2, 3, 4]
nums[::2] # [1, 3, 5]
# Mutability — lists can be changed
nums[0] = 99
[Link](6) # Add to end: [99,2,3,4,5,6]
[Link](0, 0) # Insert at index 0
[Link]([7, 8]) # Extend with iterable
[Link]() # Remove and return last: 8
[Link](0) # Remove and return index 0: 0
[Link](99) # Remove first occurrence of 99
del nums[0] # Delete by index
[Link]() # Empty the list
# Searching and sorting
nums = [3, 1, 4, 1, 5, 9, 2, 6]
[Link](4) # 2 (index of first 4)
[Link](1) # 2 (count of 1s)
[Link]() # In-place sort: [1,1,2,3,4,5,6,9]
[Link](reverse=True)# Descending
[Link]() # Reverse in place
sorted_copy = sorted(nums) # Returns new sorted list
sorted_custom = sorted(nums, key=abs) # Sort by absolute value
# List concatenation and repetition
a = [1, 2] + [3, 4] # [1, 2, 3, 4]
b = [0] * 5 # [0, 0, 0, 0, 0]
# Membership test
4 in nums # True
10 in nums # False
# Unpacking
first, *middle, last = [1, 2, 3, 4, 5]
# first=1, middle=[2,3,4], last=5
Python Comprehensive Notes — Harvard CS 50P Page
💡 TIP: Use [Link]() for in-place sorting (mutates list). Use sorted() to get a new sorted list without
modifying the original. Both accept a key= function.
3.2 Tuples
A tuple is an immutable, ordered sequence. Once created, elements cannot be added, removed, or changed.
Tuples are hashable (if all elements are hashable) and can be used as dictionary keys. They are slightly faster
than lists and signal immutability of data.
# Creation
empty = ()
single = (42,) # Trailing comma is REQUIRED for single element!
coords = (3, 4)
triple = ("Alice", 30, "Harvard")
# Parentheses are actually optional — commas make a tuple
x = 1, 2, 3 # Same as (1, 2, 3)
# Indexing (same as list)
coords[0] # 3
coords[-1] # 4
# Unpacking (elegant!)
name, age, school = triple
lat, lng = 42.3601, -71.0589 # Boston coords
# Named tuples — give fields names
from collections import namedtuple
Point = namedtuple('Point', ['x', 'y', 'z'])
p = Point(1, 2, 3)
print(p.x, p.y, p.z) # Access by name
print(p[0], p[1], p[2]) # Also by index
# Python 3.6+ [Link] (cleaner)
from typing import NamedTuple
class Student(NamedTuple):
name: str
gpa: float
year: int = 1 # Default value
alice = Student("Alice", 3.9)
print([Link]) # "Alice"
3.3 Dictionaries
A dictionary (dict) is a mutable, ordered (Python 3.7+) mapping of key-value pairs. Implemented as a hash
table, it provides O(1) average-case lookup, insertion, and deletion. Keys must be hashable (immutable).
# Creation
empty = {}
student = {"name": "Alice", "gpa": 3.9, "year": 2}
d = dict(name="Bob", gpa=3.7)
d = dict([("a", 1), ("b", 2)]) # From iterable of pairs
# Access
student["name"] # "Alice"
[Link]("major") # None (safe — no KeyError)
[Link]("major", "Undeclared") # "Undeclared"
# Modification
student["year"] = 3 # Update
Python Comprehensive Notes — Harvard CS 50P Page
student["major"] = "CS" # Add new key
del student["year"] # Delete
popped = [Link]("gpa") # Remove and return
# Iteration
for key in student: # Keys
print(key)
for value in [Link](): # Values
print(value)
for key, value in [Link](): # Key-value pairs
print(f"{key}: {value}")
# Merging dicts (Python 3.9+)
d1 = {"a": 1, "b": 2}
d2 = {"b": 3, "c": 4}
merged = d1 | d2 # {"a":1, "b":3, "c":4} (d2 wins)
d1 |= d2 # Update d1 in-place
# [Link]() (works in all versions)
[Link](d2)
# Dictionary comprehension
squares = {x: x**2 for x in range(1, 6)}
# {1:1, 2:4, 3:9, 4:16, 5:25}
# Nested dict
university = {
"CS": {"students": 500, "faculty": 40},
"Math": {"students": 300, "faculty": 25},
}
university["CS"]["students"] # 500
# setdefault — set if not present, always return value
counts = {}
for char in "mississippi":
counts[char] = [Link](char, 0) + 1
# Better: [Link]
📘 NOTE: In Python 3.7+, dicts maintain insertion order. This is part of the language specification, not just
CPython implementation detail.
3.4 Sets
A set is a mutable, unordered collection of unique, hashable elements. Implemented as a hash table. Provides
O(1) average membership testing, O(min(a,b)) intersection, and O(a+b) union.
# Creation
empty = set() # NOT {} — that's an empty dict!
fruits = {"apple", "banana", "cherry"}
from_list = set([1, 2, 2, 3, 3, 3]) # {1, 2, 3}
# Membership (O(1)) — much faster than list
"apple" in fruits # True
# Add and remove
[Link]("date")
[Link]("banana") # No error if not present
[Link]("cherry") # KeyError if not present
# Set operations (mathematical set theory)
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
Python Comprehensive Notes — Harvard CS 50P Page
a | b # Union: {1,2,3,4,5,6}
a & b # Intersection: {3,4}
a - b # Difference: {1,2} (in a but not b)
a ^ b # Symmetric: {1,2,5,6} (in one but not both)
[Link](b)
[Link](b)
[Link](b)
a.symmetric_difference(b)
# Subset / superset
{1,2}.issubset({1,2,3}) # True
{1,2,3}.issuperset({1,2}) # True
# frozenset — immutable set, can be dict key
fs = frozenset([1, 2, 3])
3.5 Collections Module — Advanced Data Structures
from collections import (
Counter, defaultdict, OrderedDict,
deque, ChainMap, UserList
)
# Counter — count occurrences
from collections import Counter
words = "the cat sat on the mat the cat".split()
c = Counter(words)
# Counter({'the': 3, 'cat': 2, 'sat': 1, 'on': 1, 'mat': 1})
c.most_common(2) # [('the', 3), ('cat', 2)]
c["the"] # 3
c["elephant"] # 0 (no KeyError!)
c + Counter(["the", "dog"]) # Combine counters
# defaultdict — auto-create missing keys
from collections import defaultdict
dd = defaultdict(list)
dd["CS"].append("Alice") # No KeyError for new key
dd["Math"].append("Bob")
word_groups = defaultdict(lambda: "unknown")
word_groups["Python"] # "unknown"
# deque — double-ended queue, O(1) append/pop from both ends
from collections import deque
dq = deque([1, 2, 3], maxlen=5)
[Link](0) # O(1) — [0, 1, 2, 3]
[Link](4) # O(1) — [0, 1, 2, 3, 4]
[Link]() # O(1) — returns 0
[Link](1) # Rotate right by 1
# OrderedDict — maintains insertion order (historical, Python 3.7+ dict does too)
# But OrderedDict has useful move_to_end() and reversed() support
od = OrderedDict()
od["first"] = 1
od["second"] = 2
od.move_to_end("first") # Move to end
od.move_to_end("first", last=False) # Move to front
Python Comprehensive Notes — Harvard CS 50P Page
MODULE 4
Control Flow
4. Control Flow
4.1 Conditional Statements
# if / elif / else
age = 20
if age < 13:
print("Child")
elif age < 18:
print("Teenager")
elif age < 65:
print("Adult")
else:
print("Senior")
# Ternary (conditional expression) — single line
label = "Adult" if age >= 18 else "Minor"
# Nested ternary (use sparingly!)
grade = "A" if score >= 90 else ("B" if score >= 80 else "C")
# Match statement (Python 3.10+) — structural pattern matching
command = "quit"
match command:
case "quit" | "exit":
print("Goodbye!")
case "hello" | "hi":
print("Hello!")
case str(msg) if len(msg) > 50: # Guard
print(f"Long message: {msg}")
case _: # Wildcard (default)
print("Unknown command")
# Match with data classes
from dataclasses import dataclass
@dataclass
class Point:
x: float
y: float
def describe(shape):
match shape:
case Point(x=0, y=0):
return "Origin"
case Point(x=0, y=y):
return f"Y-axis at {y}"
case Point(x=x, y=0):
return f"X-axis at {x}"
case Point(x=x, y=y):
return f"Point at ({x}, {y})"
case _:
return "Not a point"
Python Comprehensive Notes — Harvard CS 50P Page
4.2 Loops
for Loops
Python's for loop iterates over any iterable — not just ranges. It is implemented by calling iter() on the iterable,
then repeatedly calling next().
# Iterate over a list
fruits = ["apple", "banana", "cherry"]
for fruit in fruits:
print(fruit)
# range() — generates integers
for i in range(5): # 0, 1, 2, 3, 4
print(i)
for i in range(2, 10, 2): # 2, 4, 6, 8 (start, stop, step)
print(i)
for i in range(10, 0, -1): # 10, 9, ..., 1 (countdown)
print(i)
# enumerate() — index + value (THE PYTHONIC WAY)
for i, fruit in enumerate(fruits):
print(f"{i}: {fruit}")
for i, fruit in enumerate(fruits, start=1): # Start counting from 1
print(f"{i}. {fruit}")
# zip() — iterate multiple iterables simultaneously
names = ["Alice", "Bob", "Carol"]
scores = [95, 87, 92]
for name, score in zip(names, scores):
print(f"{name}: {score}")
# zip stops at shortest — use zip_longest for longer
from itertools import zip_longest
for name, score in zip_longest(names, scores, fillvalue=0):
pass
# Iterate dict
d = {"a": 1, "b": 2, "c": 3}
for key in d: # or [Link]()
print(key)
for val in [Link]():
print(val)
for k, v in [Link]():
print(k, v)
while Loops
# while — runs as long as condition is True
n = 1
while n < 100:
n *= 2
print(n) # 128
# while with else (runs when condition becomes False, NOT when break)
n = 10
while n > 0:
n -= 3
else:
print(f"Loop ended normally, n = {n}")
# Infinite loop with break
Python Comprehensive Notes — Harvard CS 50P Page
import random
while True:
num = [Link](1, 10)
if num == 7:
print("Got 7!")
break
Loop Control: break, continue, else
# break — exit loop immediately
for i in range(10):
if i == 5:
break
print(i) # 0, 1, 2, 3, 4
# continue — skip current iteration
for i in range(10):
if i % 2 == 0:
continue
print(i) # 1, 3, 5, 7, 9
# for-else: else runs if loop completed without break
# Used to search for an item
def find_prime_factor(n):
for i in range(2, n):
if n % i == 0:
print(f"{n} is divisible by {i}")
break
else:
print(f"{n} is prime!")
find_prime_factor(17) # "17 is prime!"
find_prime_factor(18) # "18 is divisible by 2"
💡 TIP: The for-else and while-else construct is unique to Python. The else block runs only when the loop
exhausted the iterable (for) or condition became False (while) — NOT when a break occurred.
Python Comprehensive Notes — Harvard CS 50P Page
MODULE 5
Functions — Deep Dive
5. Functions — Complete Reference
5.1 Defining and Calling Functions
# Basic function
def greet(name):
"""Return a greeting string. (This is a docstring)"""
return f"Hello, {name}!"
result = greet("Alice") # "Hello, Alice!"
# Multiple return values (actually returns a tuple)
def min_max(numbers):
return min(numbers), max(numbers)
lo, hi = min_max([3, 1, 4, 1, 5, 9])
# lo = 1, hi = 9
# Function with no return statement returns None
def say_hi():
print("Hi!")
result = say_hi() # prints "Hi!", result is None
5.2 Parameters and Arguments
# Default arguments
def power(base, exponent=2):
return base ** exponent
power(3) # 9 (exponent defaults to 2)
power(3, 3) # 27
# Keyword arguments — can pass in any order
def register(name, age, course):
print(f"{name}, {age}, {course}")
register(age=20, course="CS", name="Alice") # Order doesn't matter
# *args — variable positional arguments (tuple)
def total(*args):
return sum(args)
total(1, 2, 3, 4, 5) # 15
# **kwargs — variable keyword arguments (dict)
def profile(**kwargs):
for key, val in [Link]():
print(f" {key}: {val}")
profile(name="Alice", gpa=3.9, year=2)
# Combining all types — ORDER MATTERS:
# positional, *args, keyword-only, **kwargs
def complex_func(a, b, *args, option=False, **kwargs):
Python Comprehensive Notes — Harvard CS 50P Page
print(f"a={a}, b={b}")
print(f"args={args}")
print(f"option={option}")
print(f"kwargs={kwargs}")
complex_func(1, 2, 3, 4, 5, option=True, x=10, y=20)
# Positional-only parameters (Python 3.8+, using /)
def strictly_positional(a, b, /, c, d):
pass # a, b must be positional; c, d can be either
# Keyword-only parameters (after *)
def keyword_only(a, b, *, force=False):
pass # force must always be keyword arg
⚠️WARNING: Never use mutable default arguments! def func(data=[]): [Link](1) — the list is created
ONCE and shared across all calls. Use None as default and create inside the function.
# WRONG!
def append_item(item, lst=[]):
[Link](item)
return lst
append_item(1) # [1]
append_item(2) # [1, 2] — BUG: same list!
# CORRECT
def append_item(item, lst=None):
if lst is None:
lst = []
[Link](item)
return lst
5.3 Scope and LEGB Rule
Python resolves variable names using the LEGB rule: Local → Enclosing → Global → Built-in. Understanding
scope is critical for writing correct code.
x = "global"
def outer():
x = "enclosing"
def inner():
x = "local"
print(x) # "local" (L)
inner()
print(x) # "enclosing" (E)
outer()
print(x) # "global" (G)
# global keyword — modify global from inside function
count = 0
def increment():
global count # Declare intent to modify global
count += 1
# nonlocal keyword — modify enclosing scope
def make_counter():
count = 0
def counter():
nonlocal count # Modify enclosing count
count += 1
Python Comprehensive Notes — Harvard CS 50P Page
return count
return counter
c = make_counter()
c() # 1
c() # 2
c() # 3
5.4 Lambda Functions
Lambda creates an anonymous single-expression function. Syntactically limited — no statements, no
assignments. Best used as short callbacks passed to sorted(), map(), filter().
# Lambda syntax: lambda parameters: expression
square = lambda x: x ** 2
add = lambda x, y: x + y
noop = lambda: None
# Primary use: as key functions
students = [("Alice", 3.9), ("Bob", 3.7), ("Carol", 4.0)]
sorted_by_gpa = sorted(students, key=lambda s: s[1])
# Sort descending
sorted_desc = sorted(students, key=lambda s: s[1], reverse=True)
# With map() and filter()
nums = [1, 2, 3, 4, 5, 6]
squares = list(map(lambda x: x**2, nums)) # [1,4,9,16,25,36]
evens = list(filter(lambda x: x%2==0, nums)) # [2,4,6]
# Tip: comprehensions are usually cleaner
squares = [x**2 for x in n
ums]
evens = [x for x in nums if x % 2 == 0]
5.5 Closures
A closure is a function that captures variables from its enclosing scope. The captured variables are stored in the
function's __closure__ attribute. Used extensively for decorators, callbacks, and factory functions.
def multiplier(factor):
"""Factory function returning a multiplier closure."""
def multiply(n):
return n * factor # 'factor' is captured from enclosing scope
return multiply
double = multiplier(2)
triple = multiplier(3)
double(5) # 10
triple(5) # 15
# Inspecting closure
print(double.__closure__) # (<cell at 0x...>,)
print(double.__closure__[0].cell_contents) # 2
# Practical: memoization via closure
def make_memoized(func):
cache = {}
def memoized(*args):
if args not in cache:
cache[args] = func(*args)
return cache[args]
return memoized
Python Comprehensive Notes — Harvard CS 50P Page
@make_memoized
def fib(n):
return n if n < 2 else fib(n-1) + fib(n-2)
print(fib(100)) # Fast!
5.6 Decorators
A decorator is a higher-order function that wraps another function to extend its behavior without modifying
the original. Uses @syntax which is syntactic sugar.
import functools
import time
# Basic decorator
def timer(func):
@[Link](func) # Preserves function metadata
def wrapper(*args, **kwargs):
start = time.perf_counter()
result = func(*args, **kwargs)
end = time.perf_counter()
print(f"{func.__name__} took {end-start:.4f}s")
return result
return wrapper
@timer
def slow_add(a, b):
[Link](0.1)
return a + b
slow_add(3, 4)
# add took 0.1001s
# equivalent to: slow_add = timer(slow_add)
# Decorator with arguments
def repeat(times):
def decorator(func):
@[Link](func)
def wrapper(*args, **kwargs):
for _ in range(times):
result = func(*args, **kwargs)
return result
return wrapper
return decorator
@repeat(3)
def say(msg):
print(msg)
say("Hello!") # prints "Hello!" 3 times
# Stacking decorators (applied bottom-up)
@timer
@repeat(2)
def greet(name):
print(f"Hi {name}")
# Applied as: greet = timer(repeat(2)(greet))
# Class-based decorator
class Retry:
def __init__(self, max_attempts=3):
self.max_attempts = max_attempts
def __call__(self, func):
Python Comprehensive Notes — Harvard CS 50P Page
@[Link](func)
def wrapper(*args, **kwargs):
for attempt in range(self.max_attempts):
try:
return func(*args, **kwargs)
except Exception as e:
if attempt == self.max_attempts - 1:
raise
print(f"Attempt {attempt+1} failed: {e}")
return wrapper
@Retry(max_attempts=3)
def unreliable_function():
if [Link]() < 0.7:
raise ValueError("Random failure!")
return "success"
5.7 Recursion
# Factorial — classic recursion
def factorial(n):
"""n! = n * (n-1) * ... * 1"""
if n <= 1: # Base case
return 1
return n * factorial(n - 1) # Recursive case
factorial(5) # 120
# Fibonacci — naive recursion (O(2^n) — avoid!)
def fib_naive(n):
if n < 2: return n
return fib_naive(n-1) + fib_naive(n-2)
# Fibonacci — memoization with lru_cache
from functools import lru_cache
@lru_cache(maxsize=None)
def fib(n):
if n < 2: return n
return fib(n-1) + fib(n-2)
fib(1000) # Instant!
# Fibonacci — dynamic programming (iterative, O(n) space O(1))
def fib_dp(n):
a, b = 0, 1
for _ in range(n):
a, b = b, a + b
return a
# Tree traversal — natural recursive structure
def sum_nested(lst):
total = 0
for item in lst:
if isinstance(item, list):
total += sum_nested(item) # Recurse
else:
total += item
return total
sum_nested([1, [2, [3, 4]], 5]) # 15
# Python's default recursion limit
Python Comprehensive Notes — Harvard CS 50P Page
import sys
[Link]() # 1000 by default
[Link](5000) # Increase if needed
⚠️WARNING: Python does NOT optimize tail recursion. Deep recursion will hit RecursionError. For deep
recursion, convert to iteration or use [Link]() cautiously.
Python Comprehensive Notes — Harvard CS 50P Page
MODULE 6
Object-Oriented Programming
6. Object-Oriented Programming
6.1 Classes and Objects
Python is a fully object-oriented language — everything is an object, including functions, classes, and modules.
A class is a blueprint; an object (instance) is a concrete realization of that blueprint.
class Student:
"""Represents a university student."""
# Class variable — shared by ALL instances
university = "Harvard"
_student_count = 0 # Leading _ = private by convention
def __init__(self, name: str, gpa: float, year: int = 1):
"""Constructor — called when creating an instance."""
[Link] = name # Instance variable
[Link] = gpa
[Link] = year
Student._student_count += 1
def __repr__(self) -> str:
"""Unambiguous representation for developers."""
return f"Student(name={[Link]!r}, gpa={[Link]}, year={[Link]})"
def __str__(self) -> str:
"""Readable representation for users."""
return f"{[Link]} (Year {[Link]}, GPA {[Link]:.2f})"
def __eq__(self, other) -> bool:
if not isinstance(other, Student):
return NotImplemented
return [Link] == [Link] and [Link] == [Link]
def __lt__(self, other) -> bool:
return [Link] < [Link]
def __hash__(self):
return hash(([Link], [Link]))
# Instance method
def honor_roll(self) -> bool:
return [Link] >= 3.7
# Class method — receives class, not instance
@classmethod
def from_dict(cls, data: dict) -> "Student":
return cls(data["name"], data["gpa"], [Link]("year", 1))
@classmethod
def get_count(cls) -> int:
return cls._student_count
Python Comprehensive Notes — Harvard CS 50P Page
# Static method — no access to class or instance
@staticmethod
def is_valid_gpa(gpa: float) -> bool:
return 0.0 <= gpa <= 4.0
# Usage
alice = Student("Alice", 3.9, 2)
bob = Student.from_dict({"name": "Bob", "gpa": 3.7})
print(alice) # Alice (Year 2, GPA 3.90)
print(repr(alice)) # Student(name='Alice', gpa=3.9, year=2)
alice.honor_roll() # True
Student.get_count() # 2
Student.is_valid_gpa(4.5) # False
# Accessing class vs instance variables
[Link] # "Harvard" (from class)
[Link] # "Harvard"
# Sort list of students (uses __lt__)
students = [Student("C", 3.5), Student("A", 3.9), Student("B", 3.7)]
sorted(students) # Sorted by GPA ascending
6.2 Inheritance
class Person:
def __init__(self, name: str, age: int):
[Link] = name
[Link] = age
def greet(self) -> str:
return f"Hi, I'm {[Link]}"
def __repr__(self):
return f"{type(self).__name__}(name={[Link]!r})"
class Student(Person):
def __init__(self, name: str, age: int, gpa: float):
super().__init__(name, age) # MUST call super().__init__()
[Link] = gpa
def greet(self) -> str: # Override
base = super().greet() # Call parent's greet
return f"{base}, student with GPA {[Link]}"
class GradStudent(Student):
def __init__(self, name, age, gpa, thesis):
super().__init__(name, age, gpa)
[Link] = thesis
# isinstance checks inheritance chain
alice = GradStudent("Alice", 28, 3.95, "ML in Healthcare")
isinstance(alice, GradStudent) # True
isinstance(alice, Student) # True
isinstance(alice, Person) # True
# issubclass
issubclass(GradStudent, Student) # True
issubclass(GradStudent, Person) # True
# Method Resolution Order (MRO) — C3 linearization
print(GradStudent.__mro__)
Python Comprehensive Notes — Harvard CS 50P Page
# (<class GradStudent>, <class Student>, <class Person>, <class object>)
6.3 Multiple Inheritance and Mixins
class Flyable:
def fly(self):
return f"{self.__class__.__name__} is flying"
class Swimmable:
def swim(self):
return f"{self.__class__.__name__} is swimming"
class Duck(Flyable, Swimmable):
pass
donald = Duck()
[Link]() # "Duck is flying"
[Link]() # "Duck is swimming"
# Mixin pattern — reusable behavior modules
class JSONMixin:
"""Adds JSON serialization to any class."""
import json
def to_json(self):
import json
return [Link](self.__dict__, default=str)
@classmethod
def from_json(cls, json_str):
import json
return cls(**[Link](json_str))
class LogMixin:
"""Adds logging to any class."""
def log(self, msg):
print(f"[{type(self).__name__}] {msg}")
class SmartStudent(JSONMixin, LogMixin, Student):
pass
6.4 Abstract Classes and Interfaces
from abc import ABC, abstractmethod
class Shape(ABC):
"""Abstract base class — cannot be instantiated directly."""
@abstractmethod
def area(self) -> float:
"""Subclasses MUST implement this."""
pass
@abstractmethod
def perimeter(self) -> float:
pass
def describe(self) -> str: # Concrete method
return (f"{type(self).__name__}: "
f"area={[Link]():.2f}, "
f"perimeter={[Link]():.2f}")
Python Comprehensive Notes — Harvard CS 50P Page
class Circle(Shape):
import math
def __init__(self, radius: float):
[Link] = radius
def area(self) -> float:
import math
return [Link] * [Link] ** 2
def perimeter(self) -> float:
import math
return 2 * [Link] * [Link]
class Rectangle(Shape):
def __init__(self, w: float, h: float):
[Link] = w
[Link] = h
def area(self): return [Link] * [Link]
def perimeter(self): return 2 * ([Link] + [Link])
# Shape() # TypeError: Can't instantiate abstract class
c = Circle(5)
r = Rectangle(4, 6)
print([Link]()) # Circle: area=78.54, perimeter=31.42
6.5 Properties and Descriptors
class Temperature:
def __init__(self, celsius: float = 0):
self._celsius = celsius # Leading _ = internal storage
@property
def celsius(self) -> float:
return self._celsius
@[Link]
def celsius(self, value: float):
if value < -273.15:
raise ValueError(f"Temperature {value}°C below absolute zero!")
self._celsius = value
@[Link]
def celsius(self):
del self._celsius
@property
def fahrenheit(self) -> float:
return self._celsius * 9/5 + 32
@[Link]
def fahrenheit(self, f: float):
[Link] = (f - 32) * 5/9 # Validates via celsius setter
t = Temperature(100)
[Link] # 100
[Link] # 212.0
[Link] = 32
[Link] # 0.0
[Link] = -300 # ValueError!
Python Comprehensive Notes — Harvard CS 50P Page
6.6 Dataclasses (Python 3.7+)
from dataclasses import dataclass, field, KW_ONLY, asdict, astuple
from typing import ClassVar
@dataclass(order=True, frozen=False)
class Student:
# Fields with default values MUST come after fields without
name: str
gpa: float
year: int = 1
courses: list = field(default_factory=list) # Mutable default
# Class variable (not a dataclass field)
university: ClassVar[str] = "Harvard"
# Post-init processing
def __post_init__(self):
if not (0.0 <= [Link] <= 4.0):
raise ValueError(f"Invalid GPA: {[Link]}")
[Link] = [Link]().title()
alice = Student("alice", 3.9, 2, ["CS50", "Math55"])
print(alice)
# Student(name='Alice', gpa=3.9, year=2, courses=['CS50', 'Math55'])
# Auto-generated __repr__, __eq__ (and __lt__,__le__,... with order=True)
bob = Student("Bob", 3.7)
alice > bob # True (by gpa, since order=True)
asdict(alice) # {'name': 'Alice', 'gpa': 3.9, ...}
astuple(alice) # ('Alice', 3.9, 2, ['CS50', 'Math55'])
# Frozen dataclass (immutable — like namedtuple with type hints)
@dataclass(frozen=True)
class Point:
x: float
y: float
# frozen=True makes it hashable, can be dict key
Python Comprehensive Notes — Harvard CS 50P Page
MODULE 7
Comprehensions & Functional Programming
7. Comprehensions & Functional Programming
7.1 List Comprehensions
List comprehensions provide a concise, readable way to create lists. They are more Pythonic and often faster
than equivalent for loops because they are optimized at the bytecode level.
# Syntax: [expression for variable in iterable if condition]
# Basic
squares = [x**2 for x in range(10)]
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
# With filter
evens = [x for x in range(20) if x % 2 == 0]
# Transformation + filter
result = [[Link]() for x in ["hello","world"] if len(x) > 4]
# Nested (matrix flattening)
matrix = [[1,2,3],[4,5,6],[7,8,9]]
flat = [num for row in matrix for num in row]
# [1, 2, 3, 4, 5, 6, 7, 8, 9]
# Conditional expression (ternary in comprehension)
labels = ["even" if x%2==0 else "odd" for x in range(6)]
# Nested comprehension (matrix transpose)
T = [[row[i] for row in matrix] for i in range(3)]
# Comprehension vs map/filter
# These are equivalent:
result1 = list(map(lambda x: x**2, filter(lambda x: x%2==0, range(10))))
result2 = [x**2 for x in range(10) if x % 2 == 0]
# result2 is more readable
7.2 Dict & Set Comprehensions
# Dict comprehension
word_lengths = {word: len(word) for word in ["hello", "python", "world"]}
# {'hello': 5, 'python': 6, 'world': 5}
# Invert a dictionary
d = {"a": 1, "b": 2, "c": 3}
inverted = {v: k for k, v in [Link]()}
# {1: 'a', 2: 'b', 3: 'c'}
# Filter dict entries
high_gpa = {name: gpa for name, gpa in [Link]() if gpa >= 3.7}
# Set comprehension
unique_squares = {x**2 for x in range(-5, 6)}
Python Comprehensive Notes — Harvard CS 50P Page
# {0, 1, 4, 9, 16, 25} — no duplicates
7.3 Generator Expressions
Generator expressions are like list comprehensions but produce values lazily, one at a time. They use () instead
of [] and save memory for large datasets.
# Generator expression — lazy evaluation
gen = (x**2 for x in range(1_000_000)) # No memory allocated for 1M numbers
next(gen) # 0 — compute only what's needed
next(gen) # 1
sum(x**2 for x in range(100)) # Sum without building a list
# Compare memory usage
import sys
list_comp = [x**2 for x in range(1000)]
gen_expr = (x**2 for x in range(1000))
[Link](list_comp) # ~8056 bytes
[Link](gen_expr) # ~112 bytes (just the generator object!)
# Generators as arguments (extra parens not needed)
total = sum(x**2 for x in range(100))
maxi = max(len(word) for word in ["hello", "python", "world"])
7.4 Generator Functions
A generator function uses yield to produce a sequence of values lazily. When called, it returns a generator
object. Execution suspends at each yield and resumes when next() is called.
def count_up(start, stop, step=1):
"""Lazy range-like generator."""
current = start
while current < stop:
yield current
current += step
for n in count_up(0, 10, 2):
print(n) # 0, 2, 4, 6, 8
# Infinite generator
def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a + b
fib = fibonacci()
[next(fib) for _ in range(10)] # [0,1,1,2,3,5,8,13,21,34]
# Generator pipeline (memory-efficient data processing)
def read_large_file(path):
with open(path) as f:
for line in f:
yield [Link]()
def filter_lines(lines, keyword):
for line in lines:
if keyword in line:
yield line
def count_words(lines):
for line in lines:
Python Comprehensive Notes — Harvard CS 50P Page
yield len([Link]())
# Pipeline (each step is lazy — no large intermediate lists)
lines = read_large_file("[Link]")
errors = filter_lines(lines, "ERROR")
word_cnt = count_words(errors)
total = sum(word_cnt)
# yield from — delegate to sub-generator
def chain(*iterables):
for it in iterables:
yield from it # equivalent to: for item in it: yield item
list(chain([1,2], [3,4], [5,6])) # [1,2,3,4,5,6]
7.5 Functional Tools
from functools import reduce, partial, lru_cache, cache
from itertools import (chain, islice, takewhile, dropwhile,
groupby, combinations, permutations,
product, starmap, accumulate)
# map — apply function to each element
list(map([Link], ["hello", "world"])) # ['HELLO', 'WORLD']
list(map(pow, [2,3,4], [10,2,3])) # [1024, 9, 64]
# filter — keep elements where function returns True
list(filter([Link], ["abc", "1", "xyz", "2"])) # ['abc', 'xyz']
list(filter(None, [0, 1, "", "hi", None, True])) # [1, 'hi', True]
# reduce — fold/accumulate
from functools import reduce
reduce(lambda a,b: a*b, [1,2,3,4,5]) # 120 = 5!
reduce(max, [3,1,4,1,5,9]) # 9
# partial — fix some arguments
def power(base, exp): return base ** exp
square = partial(power, exp=2)
cube = partial(power, exp=3)
square(5) # 25
cube(3) # 27
# itertools — production-grade iterator tools
list(chain([1,2], [3,4])) # [1, 2, 3, 4]
list(islice(range(100), 5, 10)) # [5,6,7,8,9]
list(takewhile(lambda x: x<5, [1,2,3,5,1])) # [1,2,3]
list(accumulate([1,2,3,4,5])) # [1,3,6,10,15] (running sum)
list(accumulate([1,2,3,4], lambda a,b: a*b)) # [1,2,6,24]
# combinations and permutations
list(combinations("ABC", 2)) # [('A','B'),('A','C'),('B','C')]
list(permutations("AB", 2)) # [('A','B'),('B','A')]
list(product([0,1], repeat=3)) # All 3-bit binary: 8 tuples
# groupby — group consecutive elements
data = [("Alice","CS"),("Bob","CS"),("Carol","Math"),("Dan","Math")]
[Link](key=lambda x: x[1]) # Must sort first!
for dept, members in groupby(data, key=lambda x: x[1]):
print(dept, list(members))
Python Comprehensive Notes — Harvard CS 50P Page
MODULE 8
Exceptions & Error Handling
8. Exceptions & Error Handling
8.1 Exception Hierarchy
Python's exception hierarchy is a tree rooted at BaseException. All user-facing exceptions inherit from
Exception. The most important exception classes:
# BaseException
# ├── SystemExit — [Link]() raises this
# ├── KeyboardInterrupt — Ctrl+C
# ├── GeneratorExit — [Link]()
# └── Exception — Base for all normal exceptions
# ├── StopIteration — End of iteration
# ├── ArithmeticError
# │ ├── ZeroDivisionError
# │ ├── OverflowError
# │ └── FloatingPointError
# ├── LookupError
# │ ├── IndexError — list[99] on list of 3
# │ └── KeyError — dict["missing"]
# ├── ValueError — right type, wrong value
# ├── TypeError — wrong type entirely
# ├── AttributeError — obj.no_such_attr
# ├── NameError — undefined variable
# ├── OSError (IOError, FileNotFoundError, etc.)
# ├── RuntimeError
# └── NotImplementedError
8.2 try/except/else/finally
# Full exception handling syntax
try:
result = 10 / int(input("Enter a number: "))
except ZeroDivisionError:
print("Cannot divide by zero!")
except ValueError as e:
print(f"Invalid input: {e}")
except (TypeError, AttributeError) as e: # Catch multiple types
print(f"Type issue: {e}")
except Exception as e: # Catch any remaining Exception
print(f"Unexpected error: {type(e).__name__}: {e}")
raise # Re-raise the exception
else:
# Runs ONLY if no exception was raised in try block
print(f"Result: {result}")
finally:
# ALWAYS runs — use for cleanup (close files, connections, etc.)
print("Done.")
# Re-raising with context
try:
dangerous_operation()
Python Comprehensive Notes — Harvard CS 50P Page
except IOError as e:
raise RuntimeError("Failed to complete operation") from e
# Sets __cause__ and shows chained traceback
8.3 Custom Exceptions
class AppError(Exception):
"""Base exception for our application."""
pass
class ValidationError(AppError):
def __init__(self, field: str, message: str):
[Link] = field
[Link] = message
super().__init__(f"Validation error on '{field}': {message}")
class DatabaseError(AppError):
def __init__(self, query: str, cause: Exception):
[Link] = query
super().__init__(f"Database error in query: {query}")
self.__cause__ = cause
# Raise custom exceptions
def validate_age(age: int):
if not isinstance(age, int):
raise TypeError(f"Age must be int, got {type(age).__name__}")
if age < 0 or age > 150:
raise ValidationError("age", f"{age} is out of valid range [0, 150]")
return age
try:
validate_age(200)
except ValidationError as e:
print(f"Field: {[Link]}, Message: {[Link]}")
8.4 Context Managers (with statement)
Context managers automate resource management — they guarantee cleanup (like closing files) even if
exceptions occur. Implement __enter__ and __exit__ dunder methods.
# Built-in: file handling
with open("[Link]", "r") as f:
content = [Link]()
# f is automatically closed here, even if exception occurred
# Multiple context managers
with open("[Link]") as src, open("[Link]","w") as dst:
[Link]([Link]())
# Class-based context manager
class DatabaseConnection:
def __init__(self, host: str):
[Link] = host
[Link] = None
def __enter__(self):
print(f"Connecting to {[Link]}...")
[Link] = simulate_connect([Link])
return [Link] # This is bound to the 'as' target
def __exit__(self, exc_type, exc_val, exc_tb):
Python Comprehensive Notes — Harvard CS 50P Page
print("Closing connection")
if [Link]:
[Link]()
# Return True to suppress the exception; False/None to propagate
return False
with DatabaseConnection("localhost") as conn:
[Link]("SELECT * FROM students")
# contextlib — easier context manager creation
from contextlib import contextmanager, suppress
@contextmanager
def timer(label: str):
import time
start = time.perf_counter()
try:
yield # Code inside 'with' runs here
finally:
elapsed = time.perf_counter() - start
print(f"{label}: {elapsed:.4f}s")
with timer("matrix multiply"):
result = [[sum(a*b for a,b in zip(row,col))
for col in zip(*B)] for row in A]
# suppress — silently ignore specific exceptions
with suppress(FileNotFoundError):
[Link]("temp_file.txt") # No error if file doesn't exist
Python Comprehensive Notes — Harvard CS 50P Page
MODULE 9
Iterators, Generators & the Iterator Protocol
9. Iterators, Generators & the Iterator Protocol
9.1 The Iterator Protocol
Python's for loop works with any object that implements the iterator protocol: __iter__() returning an iterator
object, and __next__() returning the next value or raising StopIteration.
# How for loops work under the hood
nums = [1, 2, 3]
it = iter(nums) # Calls nums.__iter__()
print(next(it)) # 1 — calls it.__next__()
print(next(it)) # 2
print(next(it)) # 3
next(it) # Raises StopIteration
# A for loop is equivalent to:
it = iter(nums)
while True:
try:
item = next(it)
except StopIteration:
break
print(item) # Loop body
# Custom iterator
class Countdown:
def __init__(self, start: int):
[Link] = start
def __iter__(self):
return self # Iterator returns self
def __next__(self):
if [Link] <= 0:
raise StopIteration
val = [Link]
[Link] -= 1
return val
for n in Countdown(5):
print(n) # 5, 4, 3, 2, 1
# Making a class iterable (iter != iterator)
class NumberRange:
def __init__(self, start, stop):
[Link] = start
[Link] = stop
def __iter__(self):
current = [Link]
while current < [Link]:
yield current # __iter__ is a generator function!
current += 1
Python Comprehensive Notes — Harvard CS 50P Page
r = NumberRange(1, 5)
list(r) # [1, 2, 3, 4]
list(r) # [1, 2, 3, 4] — can iterate multiple times!
9.2 Advanced Generator Techniques
# Generator send() — two-way communication
def running_average():
total = 0
count = 0
avg = 0
while True:
value = yield avg # yield sends avg out, receives value in
if value is None:
break
total += value
count += 1
avg = total / count
gen = running_average()
next(gen) # Prime the generator (advance to first yield)
[Link](10) # avg = 10.0
[Link](20) # avg = 15.0
[Link](30) # avg = 20.0
# Generator throw() — inject exception
def resilient_gen():
while True:
try:
value = yield
print(f"Processing: {value}")
except ValueError as e:
print(f"Handling error: {e}")
# Coroutine-like generator pipeline
def producer(items):
for item in items:
yield item
def transformer(source, func):
for item in source:
yield func(item)
def consumer(source):
results = []
for item in source:
[Link](item)
return results
# Compose pipeline
data = producer([1, 2, 3, 4, 5])
doubled = transformer(data, lambda x: x * 2)
result = consumer(doubled) # [2, 4, 6, 8, 10]
Python Comprehensive Notes — Harvard CS 50P Page
MODULE 10
File I/O & the OS Module
10. File I/O, Serialization & the OS Module
10.1 File Operations
# Opening files — always use 'with' statement
# Modes: 'r' read, 'w' write, 'a' append, 'x' create-exclusive
# Add 'b' for binary: 'rb', 'wb'
with open("[Link]", "r", encoding="utf-8") as f:
content = [Link]() # Read entire file as string
[Link](0) # Go back to start
lines = [Link]() # List of all lines (with \n)
[Link](0)
for line in f: # Efficient line-by-line (lazy)
print([Link]())
# Writing
with open("[Link]", "w", encoding="utf-8") as f:
[Link]("Hello, World!\n")
[Link](["line1\n", "line2\n"])
# Reading/writing CSV
import csv
with open("[Link]", "w", newline="") as f:
writer = [Link](f, fieldnames=["name","gpa","year"])
[Link]()
[Link]({"name":"Alice","gpa":3.9,"year":2})
with open("[Link]") as f:
reader = [Link](f)
for row in reader:
print(row["name"], row["gpa"])
10.2 JSON Serialization
import json
from datetime import datetime
# Serialize (Python → JSON string)
data = {"name": "Alice", "scores": [95, 87, 92], "active": True}
json_str = [Link](data, indent=2, ensure_ascii=False)
# Deserialize (JSON string → Python)
parsed = [Link](json_str)
# File I/O
with open("[Link]", "w") as f:
[Link](data, f, indent=2)
with open("[Link]") as f:
loaded = [Link](f)
Python Comprehensive Notes — Harvard CS 50P Page
# Custom serializer for non-serializable types
class DateTimeEncoder([Link]):
def default(self, obj):
if isinstance(obj, datetime):
return [Link]()
return super().default(obj)
[Link]({"ts": [Link]()}, cls=DateTimeEncoder)
10.3 pathlib — Modern Path Handling
from pathlib import Path
# Create path objects
p = Path("/home/alice/documents")
p = [Link]() / "documents" / "[Link]" # / operator joins paths
# Path operations
[Link] # "[Link]"
[Link] # "data"
[Link] # ".txt"
[Link] # Path("/home/alice/documents")
[Link] # ('/', 'home', 'alice', 'documents', '[Link]')
# Filesystem queries
[Link]() # True/False
p.is_file() # True/False
p.is_dir() # True/False
[Link]().st_size # File size in bytes
# Reading/writing (no open() needed)
text = p.read_text(encoding="utf-8")
p.write_text("Hello!", encoding="utf-8")
data = p.read_bytes()
p.write_bytes(b"\x00\x01")
# Directory operations
Path("new_dir").mkdir(parents=True, exist_ok=True)
Path("[Link]").unlink(missing_ok=True)
# Glob patterns
for py_file in Path(".").rglob("*.py"):
print(py_file)
list(Path(".").glob("**/*.txt")) # All .txt recursively
10.4 os and shutil Modules
import os, shutil
[Link]() # Current working directory
[Link]("/tmp") # Change directory
[Link]["HOME"] # Environment variable
[Link]("API_KEY", "")# Safe get with default
[Link]("dir", "file") # Platform-safe path join
[Link]("[Link]")
[Link]("/a/b/[Link]") # "[Link]"
[Link]("/a/b/[Link]") # "/a/b"
# Walk directory tree
for root, dirs, files in [Link]("."):
Python Comprehensive Notes — Harvard CS 50P Page
for f in files:
print([Link](root, f))
# shutil — high-level file operations
[Link]("[Link]", "[Link]") # Copy file
shutil.copy2("[Link]", "dst/") # Copy with metadata
[Link]("[Link]", "[Link]") # Move/rename
[Link]("src_dir", "dst_dir") # Copy directory tree
[Link]("temp_dir") # Remove directory tree
shutil.make_archive("backup", "zip", ".") # Create ZIP archive
Python Comprehensive Notes — Harvard CS 50P Page
MODULE 11
Concurrency, Parallelism & Async
11. Concurrency, Parallelism & Async Programming
11.1 The GIL — Global Interpreter Lock
CPython uses a GIL — a mutex that allows only one thread to execute Python bytecode at a time. This means
threading does NOT achieve true parallelism for CPU-bound work. However, the GIL is released during I/O
operations, making threads useful for I/O-bound work. For CPU-bound parallelism, use multiprocessing.
📘 NOTE: Python 3.13+ introduces experimental no-GIL builds. In the future, the GIL may be optional or
removed entirely. For now: threads=I/O-bound, processes=CPU-bound, asyncio=many concurrent I/O tasks.
11.2 threading Module
import threading
import time
# Basic thread
def worker(name, delay):
print(f"Thread {name} starting")
[Link](delay)
print(f"Thread {name} done")
threads = [[Link](target=worker, args=(i, 0.5)) for i in range(5)]
for t in threads: [Link]()
for t in threads: [Link]() # Wait for all to complete
# Thread-safe shared state with Lock
counter = 0
lock = [Link]()
def safe_increment():
global counter
with lock: # Acquire and release automatically
counter += 1
# Condition variable — coordinate threads
condition = [Link]()
buffer = []
def producer():
for i in range(5):
with condition:
[Link](i)
[Link]() # Signal consumer
[Link](0.1)
def consumer():
while True:
with condition:
while not buffer:
[Link]() # Wait for signal
Python Comprehensive Notes — Harvard CS 50P Page
item = [Link](0)
if item == 4: break
# ThreadPoolExecutor — high-level thread pool
from [Link] import ThreadPoolExecutor, as_completed
import [Link]
def fetch_url(url):
with [Link](url, timeout=5) as r:
return len([Link]())
urls = ["[Link] "[Link]
with ThreadPoolExecutor(max_workers=4) as executor:
futures = {[Link](fetch_url, url): url for url in urls}
for future in as_completed(futures):
url = futures[future]
size = [Link]()
print(f"{url}: {size} bytes")
11.3 multiprocessing Module
from multiprocessing import Pool, Process, Queue, Manager
import os
# CPU-bound task — perfect for multiprocessing
def compute_prime_count(n):
"""Count primes up to n."""
primes = sum(1 for x in range(2, n+1)
if all(x % i != 0 for i in range(2, int(x**0.5)+1)))
return primes
if __name__ == "__main__": # Required guard for Windows!
with Pool(processes=os.cpu_count()) as pool:
results = [Link](compute_prime_count, [10000]*8)
print(sum(results))
# ProcessPoolExecutor — simpler API
from [Link] import ProcessPoolExecutor
def square(n): return n * n
if __name__ == "__main__":
with ProcessPoolExecutor() as executor:
results = list([Link](square, range(100)))
11.4 asyncio — Asynchronous I/O
asyncio enables concurrent execution of many I/O tasks in a single thread using cooperative multitasking.
Perfect for thousands of simultaneous network connections. Uses async/await syntax introduced in Python 3.5.
import asyncio
import aiohttp # pip install aiohttp
# Basic coroutine
async def greet(name: str, delay: float):
await [Link](delay) # Non-blocking sleep
print(f"Hello, {name}!")
# Run a single coroutine
[Link](greet("Alice", 1.0))
Python Comprehensive Notes — Harvard CS 50P Page
# Run multiple concurrently
async def main():
# gather — run all concurrently, return all results
results = await [Link](
greet("Alice", 1.0),
greet("Bob", 0.5),
greet("Carol", 1.5),
)
# All 3 run concurrently — total time ~1.5s, not 3s
[Link](main())
# HTTP requests with aiohttp
async def fetch(session, url):
async with [Link](url) as response:
return await [Link]()
async def fetch_all(urls):
async with [Link]() as session:
tasks = [asyncio.create_task(fetch(session, url)) for url in urls]
return await [Link](*tasks)
# [Link] — producer/consumer pattern
async def producer(queue: [Link]):
for i in range(10):
await [Link](i)
await [Link](0.05)
await [Link](None) # Sentinel
async def consumer(queue: [Link]):
while True:
item = await [Link]()
if item is None: break
print(f"Processing {item}")
queue.task_done()
async def main():
q = [Link](maxsize=3) # Bounded queue
await [Link](producer(q), consumer(q))
Python Comprehensive Notes — Harvard CS 50P Page
MODULE 12
Type Hints & Static Analysis
12. Type Hints, Annotations & Static Analysis
12.1 Basic Type Annotations
Python 3.5+ supports type annotations via the typing module. Annotations are NOT enforced at runtime —
they are hints for static analysis tools like mypy, pyright, and IDEs. PEP 526 (variable annotations) and PEP 3107
(function annotations) formalize the syntax.
# Variable annotations
name: str = "Alice"
age: int = 20
gpa: float = 3.9
flag: bool = True
# Function annotations
def greet(name: str, times: int = 1) -> str:
return (f"Hello, {name}! " * times).strip()
def no_return() -> None:
print("no return value")
12.2 The typing Module
from typing import (
Optional, Union, List, Dict, Tuple, Set,
Any, Callable, Iterator, Generator,
TypeVar, Generic, Protocol, Final,
ClassVar, Literal, TypedDict, overload
)
# Optional — value OR None (same as Union[X, None])
def find_user(id: int) -> Optional[str]:
... # returns name or None
# Union — one of several types
def process(data: Union[str, bytes, list]) -> None: ...
# Python 3.10+ — use | instead of Union
def process(data: str | bytes | list) -> None: ...
# From Python 3.9+, use built-in types directly (no List, Dict etc.)
def get_scores(names: list[str]) -> dict[str, float]: ...
# Tuple — exact structure
def get_coords() -> tuple[float, float]: ...
def variadic() -> tuple[int, ...]: ... # Variable length
# Callable
from typing import Callable
def apply(func: Callable[[int, int], int], a: int, b: int) -> int:
return func(a, b)
Python Comprehensive Notes — Harvard CS 50P Page
# TypeVar — generic type variable
T = TypeVar('T')
def first(items: list[T]) -> T:
return items[0]
# TypeVar with bounds
Numeric = TypeVar('Numeric', int, float, complex)
def double(x: Numeric) -> Numeric:
return x * 2
# Generic classes
class Stack(Generic[T]):
def __init__(self) -> None:
self._items: list[T] = []
def push(self, item: T) -> None:
self._items.append(item)
def pop(self) -> T:
return self._items.pop()
s: Stack[int] = Stack()
[Link](42)
# TypedDict — typed dictionary
class StudentRecord(TypedDict):
name: str
gpa: float
year: int
# Protocol — structural subtyping (duck typing with type safety)
class Drawable(Protocol):
def draw(self) -> None: ...
def render(shape: Drawable) -> None:
[Link]()
class Circle:
def draw(self) -> None:
print("Drawing circle")
render(Circle()) # Valid — Circle has draw()
# Literal — restrict to specific values
from typing import Literal
def set_mode(mode: Literal["read", "write", "append"]) -> None: ...
# Final — cannot be reassigned
MAX_SIZE: Final[int] = 100
12.3 Runtime Type Checking with Pydantic
# pip install pydantic
from pydantic import BaseModel, Field, validator, model_validator
from typing import Optional
from datetime import datetime
class Student(BaseModel):
name: str
email: str
gpa: float = Field(ge=0.0, le=4.0, description="GPA between 0 and 4")
year: int = Field(default=1, ge=1, le=8)
Python Comprehensive Notes — Harvard CS 50P Page
courses: list[str] = []
enrolled_at: datetime = Field(default_factory=[Link])
@validator('name')
def name_must_not_be_empty(cls, v):
if not [Link]():
raise ValueError("Name cannot be empty")
return [Link]().title()
@validator('email')
def email_must_be_valid(cls, v):
if '@' not in v:
raise ValueError("Invalid email")
return [Link]()
# Auto-validates and coerces types
alice = Student(name="alice smith", email="ALICE@[Link]", gpa=3.9)
print([Link]) # "Alice Smith" (title-cased)
print([Link]) # "alice@[Link]" (lowercased)
[Link]() # Convert to dict
[Link]() # Convert to JSON string
Python Comprehensive Notes — Harvard CS 50P Page
MODULE 13
The Python Standard Library
13. The Python Standard Library — Essential Modules
13.1 datetime — Date and Time
from datetime import datetime, date, time, timedelta, timezone
import zoneinfo # Python 3.9+
# Current date/time
now = [Link]() # Local time (naive)
utc = [Link]([Link]) # UTC (aware)
today = [Link]()
# Creating datetime objects
dt = datetime(2024, 9, 1, 9, 0, 0)
d = date(2024, 9, 1)
t = time(14, 30, 0)
# Formatting and parsing
formatted = [Link]("%Y-%m-%d %H:%M:%S") # "2024-09-01 09:00:00"
parsed = [Link]("2024-09-01", "%Y-%m-%d")
iso = [Link]() # "2024-09-01T09:00:00"
from_iso = [Link](iso)
# Arithmetic
delta = timedelta(days=30, hours=2, minutes=30)
future = dt + delta
diff = datetime(2025,1,1) - [Link]()
print(f"{[Link]} days until 2025")
# Timezone-aware (Python 3.9+)
eastern = [Link]("America/New_York")
dt_eastern = [Link](eastern)
13.2 re — Regular Expressions
import re
text = "Contact alice@[Link] or bob@[Link] for info"
# Search — find first match
m = [Link](r'\b[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}\b', text)
if m:
print([Link]()) # "alice@[Link]"
print([Link]()) # Start index
print([Link]()) # (start, end) tuple
# findall — return all matches
emails = [Link](r'[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}', text)
# ['alice@[Link]', 'bob@[Link]']
# sub — replace
clean = [Link](r'\s+', ' ', "hello world") # Remove extra spaces
Python Comprehensive Notes — Harvard CS 50P Page
# Compile for reuse (significant speedup in loops)
email_re = [Link](r'[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}')
emails = email_re.findall(text)
# Groups
date_str = "Today is 2024-09-15"
m = [Link](r'(\d{4})-(\d{2})-(\d{2})', date_str)
year, month, day = [Link](1), [Link](2), [Link](3)
# Named groups
m = [Link](r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})', date_str)
print([Link]('year')) # "2024"
print([Link]()) # {'year':'2024','month':'09','day':'15'}
# Verbose mode — readable regex
email_pattern = [Link](r"""
\b # Word boundary
[\w.+-]+ # Local part
@ # At sign
[\w-]+ # Domain name
\. # Dot
[a-zA-Z]{2,} # TLD
\b # Word boundary
""", [Link])
13.3 logging — Production Logging
import logging
# Basic config
[Link](
level=[Link],
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
[Link]('[Link]'),
[Link]() # Also print to console
]
)
# Named logger (best practice)
logger = [Link](__name__)
[Link]("Debug information")
[Link]("Application started")
[Link]("Low disk space: %d%%", 10)
[Link]("Database connection failed")
[Link]("System shutting down")
# Structured logging with extra
[Link]("User login", extra={"user_id": 42, "ip": "[Link]"})
# Exception logging
try:
1/0
except ZeroDivisionError:
[Link]("Division error") # Includes full traceback
13.4 argparse — CLI Applications
import argparse
Python Comprehensive Notes — Harvard CS 50P Page
parser = [Link](
description="Student GPA calculator",
formatter_class=[Link]
)
parser.add_argument("name", type=str, help="Student name")
parser.add_argument("scores", type=float, nargs="+", help="List of scores")
parser.add_argument("-v", "--verbose", action="store_true")
parser.add_argument("-o", "--output", type=str, default="stdout")
parser.add_argument("--min-passing", type=float, default=60.0)
args = parser.parse_args()
gpa = sum([Link]) / len([Link])
if [Link]:
print(f"Processing {len([Link])} scores for {[Link]}")
print(f"{[Link]}: {gpa:.2f}")
13.5 unittest — Testing Framework
import unittest
from [Link] import Mock, patch, MagicMock
def add(a, b): return a + b
def divide(a, b):
if b == 0: raise ZeroDivisionError("Cannot divide by zero")
return a / b
class TestMath([Link]):
def test_add_positive(self):
[Link](add(2, 3), 5)
def test_add_negative(self):
[Link](add(-1, -2), -3)
def test_divide_normal(self):
[Link](divide(1, 3), 0.333, places=3)
def test_divide_by_zero(self):
with [Link](ZeroDivisionError) as ctx:
divide(10, 0)
[Link]("Cannot divide by zero", str([Link]))
def setUp(self): # Runs before each test
[Link] = [1, 2, 3]
def tearDown(self): # Runs after each test
pass
@[Link]("Not implemented yet")
def test_future_feature(self): ...
# Mock external dependencies
@patch('[Link]')
def test_api_call(self, mock_get):
mock_get.return_value.json.return_value = {"status": "ok"}
# ... test code that calls [Link]()
mock_get.assert_called_once()
# pytest (recommended over unittest)
# pip install pytest
# def test_add(): assert add(2, 3) == 5
Python Comprehensive Notes — Harvard CS 50P Page
# pytest -v tests/
Python Comprehensive Notes — Harvard CS 50P Page
MODULE 14
Advanced Python Techniques
14. Advanced Python Techniques
14.1 Metaclasses
A metaclass is the class of a class. In Python, classes themselves are objects, created by metaclasses. The
default metaclass is type. Metaclasses enable powerful patterns like ORMs, auto-registration, and DSLs.
# type is the metaclass of all classes
type(int) # <class 'type'>
type(str) # <class 'type'>
type(list) # <class 'type'>
# Dynamically create a class with type()
MyClass = type('MyClass', (object,), {
'x': 42,
'hello': lambda self: f"Hello from {type(self).__name__}"
})
# Custom metaclass
class SingletonMeta(type):
"""Ensure only one instance of a class can exist."""
_instances = {}
def __call__(cls, *args, **kwargs):
if cls not in cls._instances:
cls._instances[cls] = super().__call__(*args, **kwargs)
return cls._instances[cls]
class Database(metaclass=SingletonMeta):
def __init__(self):
print("Creating database connection")
db1 = Database()
db2 = Database()
print(db1 is db2) # True — same object!
# Auto-registry metaclass
class PluginMeta(type):
registry = {}
def __new__(mcs, name, bases, namespace):
cls = super().__new__(mcs, name, bases, namespace)
if bases: # Skip base class itself
[Link][name] = cls
return cls
class Plugin(metaclass=PluginMeta):
pass
class CSVPlugin(Plugin):
def process(self): ...
class JSONPlugin(Plugin):
def process(self): ...
Python Comprehensive Notes — Harvard CS 50P Page
print([Link]) # {'CSVPlugin': ..., 'JSONPlugin': ...}
14.2 Descriptors
Descriptors implement __get__, __set__, and/or __delete__ methods. They power Python's property system,
classmethod, staticmethod, and many ORMs. The most powerful and underutilized feature in Python.
class TypedAttribute:
"""Descriptor that enforces type checking on assignment."""
def __init__(self, name: str, expected_type: type):
[Link] = name
self.expected_type = expected_type
self.attr_name = f"_{name}" # Private storage attribute
def __set_name__(self, owner, name):
self.attr_name = f"_{name}"
def __get__(self, obj, objtype=None):
if obj is None: # Accessed on class, not instance
return self
return getattr(obj, self.attr_name, None)
def __set__(self, obj, value):
if not isinstance(value, self.expected_type):
raise TypeError(f"{[Link]} must be {self.expected_type.__name__}, "
f"got {type(value).__name__}")
setattr(obj, self.attr_name, value)
def __delete__(self, obj):
delattr(obj, self.attr_name)
class Student:
name = TypedAttribute("name", str)
gpa = TypedAttribute("gpa", float)
year = TypedAttribute("year", int)
def __init__(self, name, gpa, year):
[Link] = name # Calls TypedAttribute.__set__
[Link] = gpa
[Link] = year
alice = Student("Alice", 3.9, 2)
[Link] = 42 # TypeError: name must be str, got int
14.3 __slots__
# By default, instances use __dict__ (a hash table) for attributes
# __slots__ replaces __dict__ with fixed-size array — saves memory!
class Point:
__slots__ = ('x', 'y', 'z') # Only these attributes allowed
def __init__(self, x, y, z):
self.x, self.y, self.z = x, y, z
p = Point(1, 2, 3)
p.x # 1
p.w = 4 # AttributeError — not in __slots__
Python Comprehensive Notes — Harvard CS 50P Page
import sys
class RegPoint:
def __init__(self, x, y, z):
self.x, self.y, self.z = x, y, z
# Memory comparison (with 1M objects)
# Regular: ~200MB — each has a __dict__
# Slots: ~75MB — no __dict__ overhead
14.4 Memory Management & the gc Module
import gc, sys
# Reference counting
x = [1, 2, 3]
[Link](x) # 2 (x + getrefcount's own reference)
y = x
[Link](x) # 3
# Garbage collector (handles circular references)
[Link]()
[Link]()
[Link]() # Force collection, returns number of collected objects
gc.get_stats() # Collection statistics
# Memory profiling
# pip install tracemalloc (built-in since 3.4)
import tracemalloc
[Link]()
# ... code to profile ...
snapshot = tracemalloc.take_snapshot()
top_stats = [Link]('lineno')
for stat in top_stats[:5]:
print(stat)
14.5 Design Patterns in Python
Observer Pattern
from typing import Callable
class EventEmitter:
def __init__(self):
self._handlers: dict[str, list[Callable]] = {}
def on(self, event: str, handler: Callable):
self._handlers.setdefault(event, []).append(handler)
return self # Allow chaining
def emit(self, event: str, *args, **kwargs):
for handler in self._handlers.get(event, []):
handler(*args, **kwargs)
emitter = EventEmitter()
[Link]("data", lambda d: print(f"Got: {d}"))
[Link]("data", lambda d: print(f"Logging: {d}"))
[Link]("data", {"user": "Alice"})
Strategy Pattern
Python Comprehensive Notes — Harvard CS 50P Page
from abc import ABC, abstractmethod
from typing import Protocol
class SortStrategy(Protocol):
def sort(self, data: list) -> list: ...
class BubbleSort:
def sort(self, data): ... # Implementation
class QuickSort:
def sort(self, data): return sorted(data)
class DataProcessor:
def __init__(self, strategy: SortStrategy):
[Link] = strategy
def process(self, data):
return [Link](data)
dp = DataProcessor(QuickSort())
[Link]([3,1,4,1,5]) # [1,1,3,4,5]
Python Comprehensive Notes — Harvard CS 50P Page
MODULE 15
Best Practices & Pythonic Code
15. Best Practices, Idioms & Pythonic Code
15.1 PEP 8 — Style Guide
PEP 8 is the de-facto style guide for Python. Tools like black (auto-formatter), flake8 (linter), and isort (import
sorter) enforce it automatically.
Convention Used For
snake_case Variables, functions, methods,
modules
PascalCase Classes
UPPER_SNAKE_CASE Constants
_single_leading Private by convention
__double_leading Name mangling (truly private)
__dunder__ Special/magic methods
# Good PEP 8 style
import os
import sys
from pathlib import Path
from typing import Optional
MAX_RETRIES: int = 3
DEFAULT_TIMEOUT: float = 30.0
class StudentDatabase:
"""A database of students. (Class docstring)"""
def __init__(self, host: str, port: int = 5432) -> None:
[Link] = host
[Link] = port
self._connection = None
def connect(self, timeout: Optional[float] = None) -> bool:
"""Connect to the database.
Args:
timeout: Connection timeout in seconds.
Returns:
True if connection successful.
Raises:
ConnectionError: If connection fails.
"""
...
Python Comprehensive Notes — Harvard CS 50P Page
@property
def is_connected(self) -> bool:
return self._connection is not None
15.2 Pythonic Idioms
# Use enumerate instead of range(len(...))
# BAD
for i in range(len(items)):
print(i, items[i])
# GOOD
for i, item in enumerate(items):
print(i, item)
# Use zip for parallel iteration
# BAD
for i in range(len(names)):
print(names[i], scores[i])
# GOOD
for name, score in zip(names, scores):
print(name, score)
# Use [Link]() with default
# BAD
if key in d:
value = d[key]
else:
value = default
# GOOD
value = [Link](key, default)
# Use 'in' for membership testing (not index)
# BAD
if [Link]("x") >= 0: ...
# GOOD
if "x" in items: ...
# Return early to reduce nesting
# BAD (arrow-shaped code)
def process(data):
if data:
if isinstance(data, list):
if len(data) > 0:
return data[0]
# GOOD (guard clauses)
def process(data):
if not data: return None
if not isinstance(data, list): return None
if not data: return None
return data[0]
# Use context managers
# BAD
f = open("[Link]")
content = [Link]()
[Link]()
# GOOD
with open("[Link]") as f:
content = [Link]()
# Swap variables
# BAD (C-style)
Python Comprehensive Notes — Harvard CS 50P Page
temp = a; a = b; b = temp
# GOOD
a, b = b, a
# Build strings with join, not concatenation
# BAD (O(n²) — creates new string each iteration)
result = ""
for word in words:
result += word + " "
# GOOD
result = " ".join(words)
# Use any() and all()
# BAD
found = False
for item in items:
if condition(item):
found = True
break
# GOOD
found = any(condition(item) for item in items)
all_valid = all(validate(item) for item in items)
15.3 Performance Tips
# 1. Use local variables in tight loops (local lookup is faster)
import math
sqrt = [Link] # Bind to local for speed
result = [sqrt(x) for x in range(10000)]
# 2. Use sets for O(1) membership testing
# BAD: O(n)
if item in large_list: ...
# GOOD: O(1)
large_set = set(large_list)
if item in large_set: ...
# 3. Use slots for memory-intensive classes (see 14.3)
# 4. Use generators for large data
# BAD: builds entire list in memory
total = sum([x**2 for x in range(1_000_000)])
# GOOD: lazy evaluation
total = sum(x**2 for x in range(1_000_000))
# 5. Profile before optimizing
import cProfile
[Link]('my_function()')
# 6. Use bisect for sorted list operations
import bisect
sorted_list = [1, 3, 5, 7, 9]
[Link](sorted_list, 4) # O(log n) insert maintaining sort
# 7. lru_cache for expensive pure functions
from functools import lru_cache
@lru_cache(maxsize=128)
def expensive_computation(n: int) -> int:
[Link](0.1) # Simulate expensive work
return n * n
# 8. Use array module for typed numeric arrays
Python Comprehensive Notes — Harvard CS 50P Page
import array
a = [Link]('d', [1.0, 2.0, 3.0]) # More memory-efficient than list
# Or use numpy for numerical computing
15.4 Security Best Practices
import secrets, hashlib, hmac
# 1. Use secrets for cryptographically secure random
token = secrets.token_hex(32) # 64-char hex string
password = secrets.token_urlsafe(16) # URL-safe base64
# 2. Hash passwords with bcrypt (not hashlib!)
# pip install bcrypt
import bcrypt
password = b"super_secret_password"
hashed = [Link](password, [Link](rounds=12))
[Link](password, hashed) # True
# 3. Constant-time comparison to prevent timing attacks
hmac.compare_digest(a, b) # NOT ==
# 4. Never use eval() or exec() with user input
# BAD:
user_input = "[Link]('rm -rf /')"
eval(user_input) # CATASTROPHIC
# 5. Use parameterized queries, never f-strings for SQL
# BAD (SQL injection!):
query = f"SELECT * FROM users WHERE name = '{user_input}'"
# GOOD:
[Link]("SELECT * FROM users WHERE name = ?", (user_input,))
# 6. Validate and sanitize all external inputs
import re
def safe_username(name: str) -> str:
if not [Link](r'^[a-zA-Z0-9_]{3,32}$', name):
raise ValueError("Invalid username")
return name
15.5 Project Structure
# Recommended project layout (src layout)
my_project/
├── src/
│ └── my_package/
│ ├── __init__.py
│ ├── [Link]
│ ├── [Link]
│ ├── [Link]
│ └── [Link]
├── tests/
│ ├── __init__.py
│ ├── test_core.py
│ └── test_models.py
├── docs/
├── .github/
│ └── workflows/
│ └── [Link]
├── [Link] # PEP 517/518 build config
├── [Link] # or [Link]
Python Comprehensive Notes — Harvard CS 50P Page
├── [Link]
├── [Link]
├── .[Link]
├── .flake8
└── [Link]
# [Link] (modern Python packaging)
# [build-system]
# requires = ["setuptools>=61", "wheel"]
# build-backend = "[Link]:build"
#
# [project]
# name = "my-package"
# version = "1.0.0"
# requires-python = ">=3.10"
# dependencies = ["requests>=2.28", "pydantic>=2.0"]
End of Python Comprehensive Notes
Harvard University • Department of Computer Science • CS 50P
"In Python, readability counts — and so does deep understanding."