0% found this document useful (0 votes)
56 views9 pages

Essential Pandas Functions Cheat Sheet

The Pandas Cheat Sheet provides a comprehensive reference for essential functions used in data analysis with Pandas. It covers various topics including importing data, data cleaning, manipulation, grouping, merging, and exporting data, along with quick reference codes for common tasks. This guide serves as a valuable resource for users looking to enhance their data analysis skills with Pandas.

Uploaded by

alfaazkiawaaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views9 pages

Essential Pandas Functions Cheat Sheet

The Pandas Cheat Sheet provides a comprehensive reference for essential functions used in data analysis with Pandas. It covers various topics including importing data, data cleaning, manipulation, grouping, merging, and exporting data, along with quick reference codes for common tasks. This guide serves as a valuable resource for users looking to enhance their data analysis skills with Pandas.

Uploaded by

alfaazkiawaaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Pandas Cheat Sheet

All Essential Functions in One Place

■ Your go-to reference for data analysis

Sections:

1. Importing & Creating Data

2. Viewing & Exploring Data

3. Selecting & Filtering

4. Data Cleaning

5. Data Manipulation

6. Grouping & Aggregating

7. Merging & Joining

8. Date & Time

9. Exporting Data
1. Importing & Creating Data

Import pandas
import pandas as pd
import numpy as np

Reading Files
# CSV
df = pd.read_csv('[Link]')
df = pd.read_csv('[Link]', encoding='utf-8', sep=',')

# Excel
df = pd.read_excel('[Link]')
df = pd.read_excel('[Link]', sheet_name='Sheet1')

# JSON
df = pd.read_json('[Link]')

# From URL
df = pd.read_csv('[Link]

# SQL
import sqlite3
conn = [Link]('[Link]')
df = pd.read_sql('SELECT * FROM table', conn)

Creating DataFrames
# From dictionary
df = [Link]({
'name': ['Rahul', 'Priya', 'Amit'],
'age': [25, 30, 28],
'city': ['Mumbai', 'Delhi', 'Pune']
})

# From list of lists


df = [Link](
[['Rahul', 25], ['Priya', 30]],
columns=['name', 'age']
)

# Empty DataFrame
df = [Link]()

2. Viewing & Exploring Data


[Link]() # First 5 rows
[Link](10) # First 10 rows
[Link]() # Last 5 rows
[Link](5) # Random 5 rows

[Link] # (rows, columns)


[Link]() # Data types, non-null counts
[Link]() # Statistics for numeric columns
[Link] # Data type of each column

[Link] # Column names


[Link] # Row index
[Link] # Data as numpy array

len(df) # Number of rows


[Link] # Total elements (rows × cols)
df.memory_usage() # Memory used by each column
3. Selecting & Filtering

Selecting Columns
df['name'] # Single column (Series)
df[['name', 'age']] # Multiple columns (DataFrame)
[Link] # Dot notation (avoid if spaces)

[Link]() # List all columns


[Link](columns=['col1']) # Remove column

Selecting Rows
[Link][0] # First row by position
[Link][0:5] # Rows 0-4
[Link][-1] # Last row

[Link][0] # Row with index 0


[Link][0:5] # Rows with index 0-5 (inclusive!)

[Link][0:5, 0:3] # Rows 0-4, Columns 0-2


[Link][0:5, ['name', 'age']] # Rows 0-5, specific columns

Filtering (Super Important!)


# Single condition
df[df['age'] > 25]
df[df['city'] == 'Mumbai']
df[df['salary'] >= 50000]

# Multiple conditions (AND)


df[(df['age'] > 25) & (df['city'] == 'Mumbai')]

# Multiple conditions (OR)


df[(df['city'] == 'Mumbai') | (df['city'] == 'Delhi')]

# isin - multiple values


df[df['city'].isin(['Mumbai', 'Delhi', 'Pune'])]

# String contains
df[df['name'].[Link]('Kumar', case=False)]

# Starts with / Ends with


df[df['name'].[Link]('R')]
df[df['name'].[Link]('a')]

# Between
df[df['age'].between(25, 35)]

# Null / Not Null


df[df['email'].isnull()]
df[df['email'].notnull()]

# Query method (cleaner syntax)


[Link]('age > 25 and city == "Mumbai"')
4. Data Cleaning

Missing Values
# Check missing
[Link]().sum() # Count nulls per column
[Link]().sum().sum() # Total nulls
[Link]().any() # Any nulls per column?

# Drop missing
[Link]() # Drop rows with any null
[Link](subset=['email']) # Drop if email is null
[Link](how='all') # Drop only if ALL values null

# Fill missing
df['col'].fillna(0) # Fill with 0
df['col'].fillna(df['col'].mean()) # Fill with mean
df['col'].fillna(df['col'].median()) # Fill with median
df['col'].fillna('Unknown') # Fill with text
[Link](method='ffill') # Forward fill
[Link](method='bfill') # Backward fill

Duplicates
[Link]().sum() # Count duplicates
df[[Link]()] # View duplicates
df.drop_duplicates() # Remove duplicates
df.drop_duplicates(subset=['email']) # Based on column
df.drop_duplicates(keep='last') # Keep last occurrence

Data Types
df['col'].astype(int) # Convert to integer
df['col'].astype(float) # Convert to float
df['col'].astype(str) # Convert to string
df['col'].astype('category') # Convert to category

pd.to_numeric(df['col'], errors='coerce') # To numeric (errors → NaN)


pd.to_datetime(df['date']) # To datetime

Renaming & Formatting


# Rename columns
[Link](columns={'old': 'new'})
[Link](columns={'name': 'full_name', 'age': 'years'})

# Clean column names


[Link] = [Link]()
[Link] = [Link](' ', '_')
[Link] = [Link]()

# String operations
df['name'] = df['name'].[Link]() # Remove whitespace
df['name'] = df['name'].[Link]() # Uppercase
df['name'] = df['name'].[Link]() # Lowercase
df['name'] = df['name'].[Link]() # Title Case
df['name'] = df['name'].[Link]('old', 'new')
5. Data Manipulation

Creating New Columns


# Simple calculation
df['total'] = df['quantity'] * df['price']
df['profit'] = df['revenue'] - df['cost']

# Conditional column
df['status'] = [Link](df['score'] >= 50, 'Pass', 'Fail')

# Multiple conditions
df['grade'] = [Link](
[df['score'] >= 90, df['score'] >= 80, df['score'] >= 70],
['A', 'B', 'C'],
default='F'
)

# Using apply
df['new_col'] = df['col'].apply(lambda x: x * 2)
df['new_col'] = [Link](lambda row: row['a'] + row['b'], axis=1)

# Using map (for mapping values)


df['region_name'] = df['region_code'].map({'N': 'North', 'S': 'South'})

Sorting
df.sort_values('column') # Ascending
df.sort_values('column', ascending=False) # Descending
df.sort_values(['col1', 'col2']) # Multiple columns
df.sort_index() # Sort by index

[Link](10, 'sales') # Top 10 by sales


[Link](10, 'price') # Bottom 10 by price

Modifying Data
# Replace values
df['col'].replace('old', 'new')
df['col'].replace({'old1': 'new1', 'old2': 'new2'})

# Drop columns/rows
[Link](columns=['col1', 'col2'])
[Link](index=[0, 1, 2])

# Reset index
df.reset_index(drop=True)

# Set column as index


df.set_index('id')

# Binning (create categories from numbers)


df['age_group'] = [Link](df['age'], bins=[0, 18, 35, 50, 100],
labels=['Child', 'Young', 'Middle', 'Senior'])
6. Grouping & Aggregating
# Basic groupby
[Link]('category')['sales'].sum()
[Link]('category')['sales'].mean()
[Link]('category')['sales'].count()

# Multiple aggregations
[Link]('category')['sales'].agg(['sum', 'mean', 'count', 'min', 'max'])

# Group by multiple columns


[Link](['region', 'category'])['sales'].sum()

# Different agg for different columns


[Link]('category').agg({
'sales': 'sum',
'quantity': 'mean',
'customer_id': 'nunique'
})

# Named aggregations
[Link]('category').agg(
total_sales=('sales', 'sum'),
avg_sales=('sales', 'mean'),
unique_customers=('customer_id', 'nunique')
)

# Reset index after groupby


[Link]('category')['sales'].sum().reset_index()

# Value counts (frequency)


df['category'].value_counts()
df['category'].value_counts(normalize=True) # Percentages

# Unique values
df['category'].unique()
df['category'].nunique()

■ Always use reset_index() after groupby if you want a DataFrame back.


7. Merging & Joining
# Inner join (only matching)
[Link](df1, df2, on='customer_id')
[Link](df1, df2, on='customer_id', how='inner')

# Left join (all from left)


[Link](df1, df2, on='customer_id', how='left')

# Right join (all from right)


[Link](df1, df2, on='customer_id', how='right')

# Outer join (all from both)


[Link](df1, df2, on='customer_id', how='outer')

# Join on different column names


[Link](df1, df2, left_on='cust_id', right_on='customer_id')

# Join on multiple columns


[Link](df1, df2, on=['customer_id', 'product_id'])

# Concatenate (stack vertically)


[Link]([df1, df2])
[Link]([df1, df2], ignore_index=True)

# Concatenate horizontally
[Link]([df1, df2], axis=1)

8. Date & Time


# Convert to datetime
df['date'] = pd.to_datetime(df['date'])
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')

# Extract components
df['year'] = df['date'].[Link]
df['month'] = df['date'].[Link]
df['day'] = df['date'].[Link]
df['weekday'] = df['date'].[Link] # 0=Monday
df['day_name'] = df['date'].dt.day_name() # Monday, Tuesday...
df['month_name'] = df['date'].dt.month_name()
df['quarter'] = df['date'].[Link]
df['week'] = df['date'].[Link]().week

# Date calculations
df['days_since'] = ([Link]() - df['date']).[Link]

# Filter by date
df[df['date'] >= '2024-01-01']
df[df['date'].between('2024-01-01', '2024-12-31')]

# Resample (time series)


df.set_index('date').resample('M').sum() # Monthly
df.set_index('date').resample('W').mean() # Weekly

9. Exporting Data
# CSV
df.to_csv('[Link]', index=False)

# Excel
df.to_excel('[Link]', index=False, sheet_name='Data')

# Multiple sheets
with [Link]('[Link]') as writer:
df1.to_excel(writer, sheet_name='Sheet1', index=False)
df2.to_excel(writer, sheet_name='Sheet2', index=False)

# JSON
df.to_json('[Link]', orient='records')

# Clipboard (for quick paste)


df.to_clipboard()
Quick Reference

Task Code

Load CSV pd.read_csv('[Link]')

First 5 rows [Link]()

Shape [Link]

Info [Link]()

Stats [Link]()

Select column df['col']

Filter df[df['col'] > value]

Multiple filters df[(cond1) & (cond2)]

Null count [Link]().sum()

Drop nulls [Link]()

Fill nulls df['col'].fillna(value)

Remove duplicates df.drop_duplicates()

Sort df.sort_values('col')

Group & sum [Link]('col')['val'].sum()

Merge [Link](df1, df2, on='col')

New column df['new'] = df['a'] + df['b']

To datetime pd.to_datetime(df['date'])

Save CSV df.to_csv('[Link]', index=False)

Keep this handy. Practice daily. You'll master pandas in no time! ■■

You might also like