Pandas Cheat Sheet
All Essential Functions in One Place
■ Your go-to reference for data analysis
Sections:
1. Importing & Creating Data
2. Viewing & Exploring Data
3. Selecting & Filtering
4. Data Cleaning
5. Data Manipulation
6. Grouping & Aggregating
7. Merging & Joining
8. Date & Time
9. Exporting Data
1. Importing & Creating Data
Import pandas
import pandas as pd
import numpy as np
Reading Files
# CSV
df = pd.read_csv('[Link]')
df = pd.read_csv('[Link]', encoding='utf-8', sep=',')
# Excel
df = pd.read_excel('[Link]')
df = pd.read_excel('[Link]', sheet_name='Sheet1')
# JSON
df = pd.read_json('[Link]')
# From URL
df = pd.read_csv('[Link]
# SQL
import sqlite3
conn = [Link]('[Link]')
df = pd.read_sql('SELECT * FROM table', conn)
Creating DataFrames
# From dictionary
df = [Link]({
'name': ['Rahul', 'Priya', 'Amit'],
'age': [25, 30, 28],
'city': ['Mumbai', 'Delhi', 'Pune']
})
# From list of lists
df = [Link](
[['Rahul', 25], ['Priya', 30]],
columns=['name', 'age']
)
# Empty DataFrame
df = [Link]()
2. Viewing & Exploring Data
[Link]() # First 5 rows
[Link](10) # First 10 rows
[Link]() # Last 5 rows
[Link](5) # Random 5 rows
[Link] # (rows, columns)
[Link]() # Data types, non-null counts
[Link]() # Statistics for numeric columns
[Link] # Data type of each column
[Link] # Column names
[Link] # Row index
[Link] # Data as numpy array
len(df) # Number of rows
[Link] # Total elements (rows × cols)
df.memory_usage() # Memory used by each column
3. Selecting & Filtering
Selecting Columns
df['name'] # Single column (Series)
df[['name', 'age']] # Multiple columns (DataFrame)
[Link] # Dot notation (avoid if spaces)
[Link]() # List all columns
[Link](columns=['col1']) # Remove column
Selecting Rows
[Link][0] # First row by position
[Link][0:5] # Rows 0-4
[Link][-1] # Last row
[Link][0] # Row with index 0
[Link][0:5] # Rows with index 0-5 (inclusive!)
[Link][0:5, 0:3] # Rows 0-4, Columns 0-2
[Link][0:5, ['name', 'age']] # Rows 0-5, specific columns
Filtering (Super Important!)
# Single condition
df[df['age'] > 25]
df[df['city'] == 'Mumbai']
df[df['salary'] >= 50000]
# Multiple conditions (AND)
df[(df['age'] > 25) & (df['city'] == 'Mumbai')]
# Multiple conditions (OR)
df[(df['city'] == 'Mumbai') | (df['city'] == 'Delhi')]
# isin - multiple values
df[df['city'].isin(['Mumbai', 'Delhi', 'Pune'])]
# String contains
df[df['name'].[Link]('Kumar', case=False)]
# Starts with / Ends with
df[df['name'].[Link]('R')]
df[df['name'].[Link]('a')]
# Between
df[df['age'].between(25, 35)]
# Null / Not Null
df[df['email'].isnull()]
df[df['email'].notnull()]
# Query method (cleaner syntax)
[Link]('age > 25 and city == "Mumbai"')
4. Data Cleaning
Missing Values
# Check missing
[Link]().sum() # Count nulls per column
[Link]().sum().sum() # Total nulls
[Link]().any() # Any nulls per column?
# Drop missing
[Link]() # Drop rows with any null
[Link](subset=['email']) # Drop if email is null
[Link](how='all') # Drop only if ALL values null
# Fill missing
df['col'].fillna(0) # Fill with 0
df['col'].fillna(df['col'].mean()) # Fill with mean
df['col'].fillna(df['col'].median()) # Fill with median
df['col'].fillna('Unknown') # Fill with text
[Link](method='ffill') # Forward fill
[Link](method='bfill') # Backward fill
Duplicates
[Link]().sum() # Count duplicates
df[[Link]()] # View duplicates
df.drop_duplicates() # Remove duplicates
df.drop_duplicates(subset=['email']) # Based on column
df.drop_duplicates(keep='last') # Keep last occurrence
Data Types
df['col'].astype(int) # Convert to integer
df['col'].astype(float) # Convert to float
df['col'].astype(str) # Convert to string
df['col'].astype('category') # Convert to category
pd.to_numeric(df['col'], errors='coerce') # To numeric (errors → NaN)
pd.to_datetime(df['date']) # To datetime
Renaming & Formatting
# Rename columns
[Link](columns={'old': 'new'})
[Link](columns={'name': 'full_name', 'age': 'years'})
# Clean column names
[Link] = [Link]()
[Link] = [Link](' ', '_')
[Link] = [Link]()
# String operations
df['name'] = df['name'].[Link]() # Remove whitespace
df['name'] = df['name'].[Link]() # Uppercase
df['name'] = df['name'].[Link]() # Lowercase
df['name'] = df['name'].[Link]() # Title Case
df['name'] = df['name'].[Link]('old', 'new')
5. Data Manipulation
Creating New Columns
# Simple calculation
df['total'] = df['quantity'] * df['price']
df['profit'] = df['revenue'] - df['cost']
# Conditional column
df['status'] = [Link](df['score'] >= 50, 'Pass', 'Fail')
# Multiple conditions
df['grade'] = [Link](
[df['score'] >= 90, df['score'] >= 80, df['score'] >= 70],
['A', 'B', 'C'],
default='F'
)
# Using apply
df['new_col'] = df['col'].apply(lambda x: x * 2)
df['new_col'] = [Link](lambda row: row['a'] + row['b'], axis=1)
# Using map (for mapping values)
df['region_name'] = df['region_code'].map({'N': 'North', 'S': 'South'})
Sorting
df.sort_values('column') # Ascending
df.sort_values('column', ascending=False) # Descending
df.sort_values(['col1', 'col2']) # Multiple columns
df.sort_index() # Sort by index
[Link](10, 'sales') # Top 10 by sales
[Link](10, 'price') # Bottom 10 by price
Modifying Data
# Replace values
df['col'].replace('old', 'new')
df['col'].replace({'old1': 'new1', 'old2': 'new2'})
# Drop columns/rows
[Link](columns=['col1', 'col2'])
[Link](index=[0, 1, 2])
# Reset index
df.reset_index(drop=True)
# Set column as index
df.set_index('id')
# Binning (create categories from numbers)
df['age_group'] = [Link](df['age'], bins=[0, 18, 35, 50, 100],
labels=['Child', 'Young', 'Middle', 'Senior'])
6. Grouping & Aggregating
# Basic groupby
[Link]('category')['sales'].sum()
[Link]('category')['sales'].mean()
[Link]('category')['sales'].count()
# Multiple aggregations
[Link]('category')['sales'].agg(['sum', 'mean', 'count', 'min', 'max'])
# Group by multiple columns
[Link](['region', 'category'])['sales'].sum()
# Different agg for different columns
[Link]('category').agg({
'sales': 'sum',
'quantity': 'mean',
'customer_id': 'nunique'
})
# Named aggregations
[Link]('category').agg(
total_sales=('sales', 'sum'),
avg_sales=('sales', 'mean'),
unique_customers=('customer_id', 'nunique')
)
# Reset index after groupby
[Link]('category')['sales'].sum().reset_index()
# Value counts (frequency)
df['category'].value_counts()
df['category'].value_counts(normalize=True) # Percentages
# Unique values
df['category'].unique()
df['category'].nunique()
■ Always use reset_index() after groupby if you want a DataFrame back.
7. Merging & Joining
# Inner join (only matching)
[Link](df1, df2, on='customer_id')
[Link](df1, df2, on='customer_id', how='inner')
# Left join (all from left)
[Link](df1, df2, on='customer_id', how='left')
# Right join (all from right)
[Link](df1, df2, on='customer_id', how='right')
# Outer join (all from both)
[Link](df1, df2, on='customer_id', how='outer')
# Join on different column names
[Link](df1, df2, left_on='cust_id', right_on='customer_id')
# Join on multiple columns
[Link](df1, df2, on=['customer_id', 'product_id'])
# Concatenate (stack vertically)
[Link]([df1, df2])
[Link]([df1, df2], ignore_index=True)
# Concatenate horizontally
[Link]([df1, df2], axis=1)
8. Date & Time
# Convert to datetime
df['date'] = pd.to_datetime(df['date'])
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
# Extract components
df['year'] = df['date'].[Link]
df['month'] = df['date'].[Link]
df['day'] = df['date'].[Link]
df['weekday'] = df['date'].[Link] # 0=Monday
df['day_name'] = df['date'].dt.day_name() # Monday, Tuesday...
df['month_name'] = df['date'].dt.month_name()
df['quarter'] = df['date'].[Link]
df['week'] = df['date'].[Link]().week
# Date calculations
df['days_since'] = ([Link]() - df['date']).[Link]
# Filter by date
df[df['date'] >= '2024-01-01']
df[df['date'].between('2024-01-01', '2024-12-31')]
# Resample (time series)
df.set_index('date').resample('M').sum() # Monthly
df.set_index('date').resample('W').mean() # Weekly
9. Exporting Data
# CSV
df.to_csv('[Link]', index=False)
# Excel
df.to_excel('[Link]', index=False, sheet_name='Data')
# Multiple sheets
with [Link]('[Link]') as writer:
df1.to_excel(writer, sheet_name='Sheet1', index=False)
df2.to_excel(writer, sheet_name='Sheet2', index=False)
# JSON
df.to_json('[Link]', orient='records')
# Clipboard (for quick paste)
df.to_clipboard()
Quick Reference
Task Code
Load CSV pd.read_csv('[Link]')
First 5 rows [Link]()
Shape [Link]
Info [Link]()
Stats [Link]()
Select column df['col']
Filter df[df['col'] > value]
Multiple filters df[(cond1) & (cond2)]
Null count [Link]().sum()
Drop nulls [Link]()
Fill nulls df['col'].fillna(value)
Remove duplicates df.drop_duplicates()
Sort df.sort_values('col')
Group & sum [Link]('col')['val'].sum()
Merge [Link](df1, df2, on='col')
New column df['new'] = df['a'] + df['b']
To datetime pd.to_datetime(df['date'])
Save CSV df.to_csv('[Link]', index=False)
Keep this handy. Practice daily. You'll master pandas in no time! ■■