0% found this document useful (0 votes)

80 views8 pages

NumPy and Pandas Tutorial Guide

Uploaded by

omvati343

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views8 pages

NumPy and Pandas Tutorial Guide

Uploaded by

omvati343

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

NumPy and Pandas for Data Analysis AI ML Training

NumPy Tutorial
Introduction

NumPy (Numerical Python) is a library for the Python programming language, adding support
for large, multi-dimensional arrays and matrices, along with a large collection of high-level
mathematical functions to operate on these arrays.

Installation

To install NumPy, use the following command:

pip install numpy

Basic Operations

Importing NumPy

import numpy as np

Creating Arrays

# Create a 1D array
array_1d = [Link]([1, 2, 3, 4, 5])
print(array_1d)

# Create a 2D array
array_2d = [Link]([[1, 2, 3], [4, 5, 6]])
print(array_2d)

# Create an array with zeros

zeros_array = [Link]((3, 4))
print(zeros_array)

# Create an array with ones

ones_array = [Link]((2, 3))
print(ones_array)

# Create an identity matrix

identity_matrix = [Link](3)
print(identity_matrix)

# Create an array with a range of values

range_array = [Link](10, 20, 2)
print(range_array)

# Create an array with evenly spaced values

linspace_array = [Link](0, 1, 5)
print(linspace_array)

Array Operations

# Arithmetic operations
a = [Link]([1, 2, 3])
b = [Link]([4, 5, 6])

LinkedIn: [Link]/in/nidhi-grover-raheja-904211138 1 |Pa ge

NumPy and Pandas for Data Analysis AI ML Training

print(a + b) # Addition
print(a - b) # Subtraction
print(a * b) # Element-wise multiplication
print(a / b) # Element-wise division

# Matrix multiplication
matrix_a = [Link]([[1, 2], [3, 4]])
matrix_b = [Link]([[5, 6], [7, 8]])
print([Link](matrix_a, matrix_b))

# Broadcasting
array_broadcast = [Link]([1, 2, 3])
print(array_broadcast + 1) # Adds 1 to each element

# Statistical operations
print([Link](a)) # Mean
print([Link](a)) # Median
print([Link](a)) # Standard deviation
print([Link](a)) # Sum
print([Link](a)) # Minimum
print([Link](a)) # Maximum

Indexing and Slicing

array = [Link]([1, 2, 3, 4, 5, 6])

# Indexing
print(array[0]) # First element
print(array[-1]) # Last element

# Slicing
print(array[1:4]) # Elements from index 1 to 3
print(array[:3]) # First three elements
print(array[3:]) # Elements from index 3 to end
print(array[::2]) # Every second element

Reshaping Arrays

array = [Link](1, 10)

reshaped_array = [Link]((3, 3))
print(reshaped_array)

# Flattening arrays
flattened_array = reshaped_array.flatten()
print(flattened_array)

Pandas Tutorial
Introduction

Pandas is a library providing high-performance, easy-to-use data structures and data analysis
tools for the Python programming language.

Installation

LinkedIn: [Link]/in/nidhi-grover-raheja-904211138 2 |Pa ge

NumPy and Pandas for Data Analysis AI ML Training

To install Pandas, use the following command:

pip install pandas

Basic Operations

Importing Pandas

import pandas as pd

Creating DataFrames

# Create a DataFrame from a dictionary

data = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'City': ['New York', 'Paris', 'Berlin', 'London']
}
df = [Link](data)
print(df)

# Create a DataFrame from a CSV file

df_from_csv = pd.read_csv('path_to_csv_file.csv')
print(df_from_csv)

Viewing Data

# Display the first few rows

print([Link]())

# Display the last few rows

print([Link]())

# Display the data types of columns

print([Link])

# Display the shape of the DataFrame

print([Link])

# Display summary statistics

print([Link]())

Selecting Data

# Select a single column

print(df['Name'])

# Select multiple columns

print(df[['Name', 'City']])

# Select rows by index

print([Link][0]) # First row
print([Link][0:2]) # First two rows

# Select rows by label

print([Link][0]) # First row
print([Link][0:2]) # First three rows (inclusive)

LinkedIn: [Link]/in/nidhi-grover-raheja-904211138 3 |Pa ge

NumPy and Pandas for Data Analysis AI ML Training

# Conditional selection
print(df[df['Age'] > 30])

Adding and Dropping Columns

# Add a new column

df['Country'] = ['USA', 'France', 'Germany', 'UK']
print(df)

# Drop a column
df = [Link]('Country', axis=1)
print(df)

Handling Missing Data

# Create a DataFrame with missing values

data_with_nan = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, None, 35, 32],
'City': ['New York', 'Paris', None, 'London']
}
df_nan = [Link](data_with_nan)
print(df_nan)

# Drop rows with missing values

df_dropped_nan = df_nan.dropna()
print(df_dropped_nan)

# Fill missing values

df_filled_nan = df_nan.fillna({'Age': df_nan['Age'].mean(), 'City':
'Unknown'})
print(df_filled_nan)

Grouping and Aggregating Data

# Group by a column and calculate mean

print([Link]('City').mean())

# Group by multiple columns and calculate sum

print([Link](['City', 'Name']).sum())

Merging DataFrames

# Create two DataFrames

df1 = [Link]({'Name': ['John', 'Anna'], 'Age': [28, 24]})
df2 = [Link]({'Name': ['Peter', 'Linda'], 'City': ['Berlin',
'London']})

# Concatenate DataFrames
df_concat = [Link]([df1, df2], ignore_index=True)
print(df_concat)

# Merge DataFrames
df_merge = [Link](df1, df2, on='Name', how='inner')
print(df_merge)

LinkedIn: [Link]/in/nidhi-grover-raheja-904211138 4 |Pa ge

NumPy and Pandas for Data Analysis AI ML Training

Exporting Data

# Export DataFrame to CSV

df.to_csv('[Link]', index=False)

# Export DataFrame to Excel

df.to_excel('[Link]', index=False)

Advanced Pandas Tutorial

Handling Time Series Data

Pandas provides robust support for time series data. Here's how to work with it.

Creating Time Series Data

# Create a date range

date_range = pd.date_range(start='2023-01-01', periods=10, freq='D')
print(date_range)

# Create a DataFrame with time series data

time_series_data = {
'Date': date_range,
'Value': [Link](10)
}
df_time_series = [Link](time_series_data)
df_time_series.set_index('Date', inplace=True)
print(df_time_series)

Resampling Time Series Data

# Resample to weekly frequency and calculate the mean

df_resampled = df_time_series.resample('W').mean()
print(df_resampled)

# Resample to monthly frequency and calculate the sum

df_resampled_monthly = df_time_series.resample('M').sum()
print(df_resampled_monthly)

Working with Categorical Data

# Create a DataFrame with categorical data
data = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'City': ['New York', 'Paris', 'Berlin', 'London'],
'Gender': ['Male', 'Female', 'Male', 'Female']
}
df_categorical = [Link](data)

# Convert a column to categorical type

df_categorical['Gender'] = df_categorical['Gender'].astype('category')
print(df_categorical)

# Get the categories and codes

print(df_categorical['Gender'].[Link])

LinkedIn: [Link]/in/nidhi-grover-raheja-904211138 5 |Pa ge

NumPy and Pandas for Data Analysis AI ML Training

print(df_categorical['Gender'].[Link])

Pivot Tables
# Create a DataFrame
data = {
'Name': ['John', 'Anna', 'John', 'Anna', 'John', 'Anna'],
'Month': ['Jan', 'Jan', 'Feb', 'Feb', 'Mar', 'Mar'],
'Sales': [150, 200, 130, 210, 170, 220]
}
df_sales = [Link](data)

# Create a pivot table

pivot_table = df_sales.pivot_table(values='Sales', index='Name',
columns='Month', aggfunc='sum')
print(pivot_table)

Handling Large Datasets

# Read a large CSV file in chunks
chunk_size = 1000
chunks = pd.read_csv('large_dataset.csv', chunksize=chunk_size)

# Process each chunk

for chunk in chunks:
# Perform operations on the chunk
print([Link])

Applying Functions

Using apply()

# Create a DataFrame
data = {
'A': [1, 2, 3],
'B': [4, 5, 6]
}
df = [Link](data)

# Define a function
def add_one(x):
return x + 1

# Apply the function to each element

print([Link](add_one))

# Apply the function to each column

print([Link](lambda x: x + 1))

# Apply the function to each row

print([Link](lambda x: x + 1, axis=1))

Joining DataFrames
# Create two DataFrames
df1 = [Link]({
'key': ['A', 'B', 'C', 'D'],

LinkedIn: [Link]/in/nidhi-grover-raheja-904211138 6 |Pa ge

NumPy and Pandas for Data Analysis AI ML Training

'value': [1, 2, 3, 4]
})
df2 = [Link]({
'key': ['B', 'D', 'E', 'F'],
'value': [5, 6, 7, 8]
})

# Inner join
inner_joined = [Link](df1, df2, on='key', how='inner')
print(inner_joined)

# Left join
left_joined = [Link](df1, df2, on='key', how='left')
print(left_joined)

# Right join
right_joined = [Link](df1, df2, on='key', how='right')
print(right_joined)

# Outer join
outer_joined = [Link](df1, df2, on='key', how='outer')
print(outer_joined)

Window Functions
# Create a DataFrame with time series data
data = {
'Date': pd.date_range(start='2023-01-01', periods=10, freq='D'),
'Value': [Link](10)
}
df = [Link](data)
df.set_index('Date', inplace=True)

# Calculate rolling mean

rolling_mean = df['Value'].rolling(window=3).mean()
print(rolling_mean)

# Calculate expanding sum

expanding_sum = df['Value'].expanding().sum()
print(expanding_sum)

# Calculate exponentially weighted mean

ewm_mean = df['Value'].ewm(span=3).mean()
print(ewm_mean)

Handling JSON Data

# Create a JSON string
json_str = '''
[
{"Name": "John", "Age": 28, "City": "New York"},
{"Name": "Anna", "Age": 24, "City": "Paris"},
{"Name": "Peter", "Age": 35, "City": "Berlin"}
]
'''

# Read JSON string into DataFrame

df_json = pd.read_json(json_str)
print(df_json)

LinkedIn: [Link]/in/nidhi-grover-raheja-904211138 7 |Pa ge

NumPy and Pandas for Data Analysis AI ML Training

# Export DataFrame to JSON

df_json.to_json('[Link]', orient='records', lines=True)

Advanced Indexing with MultiIndex

# Create a MultiIndex DataFrame
arrays = [
['A', 'A', 'B', 'B'],
['one', 'two', 'one', 'two']
]
index = [Link].from_arrays(arrays, names=('first', 'second'))
df_multi = [Link]({'value': [1, 2, 3, 4]}, index=index)
print(df_multi)

# Accessing data in MultiIndex DataFrame

print(df_multi.loc['A'])
print(df_multi.loc[('A', 'one')])

Combining DataFrames with concat and append

# Create DataFrames
df1 = [Link]({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
df2 = [Link]({
'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5']
})

# Concatenate DataFrames
concatenated = [Link]([df1, df2], ignore_index=True)
print(concatenated)

# Append DataFrames
appended = [Link](df2, ignore_index=True)
print(appended)

Performance Tips
# Use vectorized operations instead of loops
data = [Link]({
'A': range(1000000),
'B': range(1000000)
})

# Inefficient way: Using loops

data['C'] = [x + y for x, y in zip(data['A'], data['B'])]

# Efficient way: Using vectorized operations

data['C'] = data['A'] + data['B']

LinkedIn: [Link]/in/nidhi-grover-raheja-904211138 8 |Pa ge

Common questions

In NumPy, static arrays are created using functions like np.array(), np.zeros(), or np.ones(), which require predefined shapes and data types . These arrays are efficient for numerical computations but lack flexibility. In contrast, Pandas allows dynamic manipulation of data via DataFrames, which can be modified by adding or dropping columns, handling missing data, and inputting data from external sources like CSV and JSON files . This flexibility makes Pandas suitable for time-series data manipulation, summary statistics, and merging datasets, as it automatically adjusts to changing data structures.

Grouping and aggregating in Pandas allows for segmenting datasets into groups based on distinct column values and applying aggregation functions like mean or sum across these groups, providing insights into data trends and distributions . While NumPy can perform basic aggregation using operations like sum and mean, Pandas' grouping capabilities, such as df.groupby(['City']).mean(), enable detailed categorical analysis that NumPy's flat handling of arrays cannot easily achieve . This ability to relate and analyze multiple data dimensions enhances descriptive data analysis without complex computations.

NumPy provides support for n-dimensional arrays and a suite of mathematical operations to analyze data, making it an ideal foundation for data processing tasks . Pandas builds upon NumPy's capabilities by offering data structures like Series and DataFrames that facilitate data manipulation and analysis with additional features like indexing, grouping, and merging datasets . Thus, NumPy is often used for performance optimization with large datasets, while Pandas offers easier and more expressive methods to structure, filter, and aggregate data, making them complementary tools in a data analyst's toolkit.

The key difference between merging and joining DataFrames in Pandas lies in the semantics of data sources alignment . Merging involves combining DataFrames based on common keys or indices using methods like inner, outer, left, and right joins, directly impacting the resulting dataset's data integrity by determining which entries from the combined inputs are included . Joining is closely related but assumes the datasets have shared indices, simplifying the process when working with aligned datasets. Correct selection of merge or join method critically maintains data relationships and ensures inclusion of relevant dataset portions while avoiding data loss or redundancy.

In NumPy, data filtering is primarily achieved through basic indexing and slicing of arrays, which focuses on specific positions or fixed patterns . For example, obtaining elements using array[1:3] relies on know positions. Pandas offers more advanced data filtering capabilities through conditional selection, allowing operations based on data values and conditions such as df[df['Age'] > 30]. This approach functions similarly to SQL-like queries, enabling analysts to easily filter large datasets using logical conditions across multiple columns, offering more flexibility compared to NumPy's position-based filtering.

Challenges with MultiIndex DataFrames in Pandas include increased complexity in data selection and manipulation, which can lead to longer and less intuitive code as accessing data requires a clear understanding of the multi-level structure . These challenges can be addressed by using methods like reset_index() to flatten the DataFrame when convenient or using specific loc indexers to precisely target desired data slices . Thorough documentation and consistent use of naming conventions help reduce complexity, making multi-dimensional data handling in MultiIndex DataFrames more manageable.

Pivot tables enhance the analytical capabilities of Pandas by allowing transformation of DataFrames into summarized tables organized across multiple dimensions with aggregation functions like sum or average . This is particularly useful for business analytics, as pivot tables can quickly display patterns and insights by rearranging categorical data into a format that highlights trends, comparisons, and distributions. For example, using pivot_table with indices and columns enables the automatic computation and arrangement of data summaries without manual computations, streamlining comprehensive data analysis and decision-making processes.

Window functions in Pandas facilitate time-series data analysis by allowing operations over a specified number of previous observations, enabling trend identification and noise reduction . For instance, functions like rolling().mean() compute moving averages, smoothing out short-term fluctuations, while ewm().mean() calculates exponentially weighted moving averages, giving more weight to recent observations . These operations are essential in financial data analysis, stock market predictions, and weather data monitoring as they help identify underlying patterns and forecasts in time-dependent datasets.

It's recommended to fill missing data instead of dropping it in datasets for machine learning to prevent data loss, which can lead to biased models or insufficient training inputs . Dropping rows or columns with missing data could potentially discard valuable information, especially in sparse datasets. Techniques like using the mean, median, or a constant value to fill missing entries can preserve the original dataset size and maintain statistical representativity . These practices ensure models have enough information to learn effectively, maintain balanced feature distributions, and improve generalization abilities.

Using vectorized operations in Pandas and NumPy is crucial for performance optimization, as these operations are implemented in C, making them inherently faster than Python loops . In situations involving large datasets, loops generate overhead from Python's interpretation process, which is inefficient compared to vectorized operations that leverage memory-efficient, low-level array computations. For example, adding two columns with vectorized operation data['C'] = data['A'] + data['B'] in Pandas is significantly faster than looping over elements . This practice not only speedups computations but also leads to cleaner, more readable code.

NumPy and Pandas for Data Analysis
No ratings yet
NumPy and Pandas for Data Analysis
34 pages
Unit V - Python Libraries
No ratings yet
Unit V - Python Libraries
34 pages
Numpy Programs
No ratings yet
Numpy Programs
9 pages
Python Data Analysis with Pandas & PyNumS
No ratings yet
Python Data Analysis with Pandas & PyNumS
10 pages
Understanding NumPy for Data Science
No ratings yet
Understanding NumPy for Data Science
52 pages
NumPy and Pandas: Python Data Science Tools
No ratings yet
NumPy and Pandas: Python Data Science Tools
12 pages
MANUAL Basic NumPy Pandas MatPlot
No ratings yet
MANUAL Basic NumPy Pandas MatPlot
5 pages
DST Exp
No ratings yet
DST Exp
26 pages
NumPy and Pandas Tutorial Guide
No ratings yet
NumPy and Pandas Tutorial Guide
9 pages
Explore NumPy, SciPy, Jupyter, Pandas
No ratings yet
Explore NumPy, SciPy, Jupyter, Pandas
18 pages
Data Science with Python: NumPy, Pandas, Matplotlib
No ratings yet
Data Science with Python: NumPy, Pandas, Matplotlib
36 pages
Data Analysis with NumPy and Pandas
No ratings yet
Data Analysis with NumPy and Pandas
55 pages
Python Data Analysis Cheat Sheet
100% (3)
Python Data Analysis Cheat Sheet
9 pages
Pandas Data Analysis: A Comprehensive Guide
No ratings yet
Pandas Data Analysis: A Comprehensive Guide
13 pages
Python Content
No ratings yet
Python Content
37 pages
Introduction to NumPy for ML
No ratings yet
Introduction to NumPy for ML
27 pages
Chapter 2 Introduction To Python Libraries
No ratings yet
Chapter 2 Introduction To Python Libraries
46 pages
Essential Python Packages Overview
No ratings yet
Essential Python Packages Overview
21 pages
NumPy vs Pandas: Key Differences Explained
No ratings yet
NumPy vs Pandas: Key Differences Explained
6 pages
Python Project 1
No ratings yet
Python Project 1
13 pages
Data Science and Machine Learning Progress
No ratings yet
Data Science and Machine Learning Progress
39 pages
Data Science Lab Manual for AI Students
No ratings yet
Data Science Lab Manual for AI Students
70 pages
Section 10 - Numpy, Pandas, and Matplotlib
No ratings yet
Section 10 - Numpy, Pandas, and Matplotlib
89 pages
NumPy Pandas Functions
No ratings yet
NumPy Pandas Functions
11 pages
Data Exploration Lab Manual
No ratings yet
Data Exploration Lab Manual
91 pages
Pandas Basics: DataFrames & Operations
No ratings yet
Pandas Basics: DataFrames & Operations
25 pages
Data Science with Pandas Guide
No ratings yet
Data Science with Pandas Guide
30 pages
NumPy and Pandas Data Analysis Guide
No ratings yet
NumPy and Pandas Data Analysis Guide
26 pages
Mastering NumPy and pandas Basics
No ratings yet
Mastering NumPy and pandas Basics
3 pages
Introduction to NumPy for Data Science
No ratings yet
Introduction to NumPy for Data Science
17 pages
AI & Data Science Lab Manual
No ratings yet
AI & Data Science Lab Manual
27 pages
Pandas Data Handling Guide-1
No ratings yet
Pandas Data Handling Guide-1
13 pages
NumPy & Pandas: Data Science Guide
No ratings yet
NumPy & Pandas: Data Science Guide
5 pages
Introduction to NumPy Basics
No ratings yet
Introduction to NumPy Basics
35 pages
Python 3 Data Analysis with NumPy & Pandas
No ratings yet
Python 3 Data Analysis with NumPy & Pandas
12 pages
Python Data Analysis with NumPy & Pandas
No ratings yet
Python Data Analysis with NumPy & Pandas
17 pages
Python Programming Module 4
No ratings yet
Python Programming Module 4
16 pages
Python Random Module and Data Packages
No ratings yet
Python Random Module and Data Packages
9 pages
Basic NumPy & Pandas Notes Guide
No ratings yet
Basic NumPy & Pandas Notes Guide
13 pages
Data Science Lab Experiments Guide
No ratings yet
Data Science Lab Experiments Guide
59 pages
Pandas Slides
No ratings yet
Pandas Slides
38 pages
Numpy and Pandas
No ratings yet
Numpy and Pandas
12 pages
DataScience Notes EI334
No ratings yet
DataScience Notes EI334
15 pages
Unit 4 Basics of AI
No ratings yet
Unit 4 Basics of AI
5 pages
Lecture14 Numpy Operations
No ratings yet
Lecture14 Numpy Operations
21 pages
Data Analysis with Pandas and NumPy
No ratings yet
Data Analysis with Pandas and NumPy
31 pages
Python Data Analysis with NumPy & Pandas
No ratings yet
Python Data Analysis with NumPy & Pandas
14 pages
Python Advanced Study Guide
No ratings yet
Python Advanced Study Guide
12 pages
NumPy and Pandas: Data Analysis Guide
No ratings yet
NumPy and Pandas: Data Analysis Guide
4 pages
Python Data Analysis with NumPy & Pandas
No ratings yet
Python Data Analysis with NumPy & Pandas
65 pages
NumPy and Pandas for Data Analysis
No ratings yet
NumPy and Pandas for Data Analysis
36 pages
Introduction to NumPy and Pandas
No ratings yet
Introduction to NumPy and Pandas
57 pages
Introduction to Python NumPy Arrays
No ratings yet
Introduction to Python NumPy Arrays
7 pages
Python Libraries Overview: NumPy, Matplotlib, Pandas
No ratings yet
Python Libraries Overview: NumPy, Matplotlib, Pandas
33 pages
AIML Internship Preparation Guide
No ratings yet
AIML Internship Preparation Guide
32 pages
Data Manipulation with NumPy & Pandas
No ratings yet
Data Manipulation with NumPy & Pandas
27 pages
Numpy Panda
No ratings yet
Numpy Panda
10 pages
Query Processing in Database Systems
No ratings yet
Query Processing in Database Systems
49 pages
Multimedia Database Solutions Explained
No ratings yet
Multimedia Database Solutions Explained
26 pages
Understanding Hash Functions and Tables
No ratings yet
Understanding Hash Functions and Tables
26 pages
Understanding Graphs: Types and Representations
No ratings yet
Understanding Graphs: Types and Representations
158 pages
Understanding Recursion in Programming
No ratings yet
Understanding Recursion in Programming
12 pages
Understanding Linked Lists: Types & Operations
No ratings yet
Understanding Linked Lists: Types & Operations
69 pages
Introduction to Probability Theory
No ratings yet
Introduction to Probability Theory
124 pages
AI and Neural Networks Overview
No ratings yet
AI and Neural Networks Overview
107 pages
Community Communication Strategies
No ratings yet
Community Communication Strategies
7 pages
Video Setup Guide
No ratings yet
Video Setup Guide
11 pages
UGC NET OS: Key Questions & Concepts
No ratings yet
UGC NET OS: Key Questions & Concepts
218 pages
Hareesh Rajendran: Software Developer Profile
No ratings yet
Hareesh Rajendran: Software Developer Profile
1 page
Military Overturn App Error Logs
No ratings yet
Military Overturn App Error Logs
38 pages
BusWise Report
No ratings yet
BusWise Report
34 pages
Data Structures and Algorithms Overview
No ratings yet
Data Structures and Algorithms Overview
15 pages
CSC443 Test 2 Past Year Questions
No ratings yet
CSC443 Test 2 Past Year Questions
13 pages
Junior Electrical Engineer Resume
No ratings yet
Junior Electrical Engineer Resume
2 pages
Velocloud Partner Guide 33
No ratings yet
Velocloud Partner Guide 33
83 pages
Managing WhatsApp Increment Files
No ratings yet
Managing WhatsApp Increment Files
10 pages
Seja Foda: O Caminho para o Sucesso
No ratings yet
Seja Foda: O Caminho para o Sucesso
128 pages
Android App Debugging Logs Analysis
No ratings yet
Android App Debugging Logs Analysis
5 pages
Windows Privilege Escalation Guide
No ratings yet
Windows Privilege Escalation Guide
32 pages
Mobile App Development Course Overview
No ratings yet
Mobile App Development Course Overview
249 pages
Python Coding HW Sol
No ratings yet
Python Coding HW Sol
7 pages
Linux Admin: Setup, LVM, NGINX, MySQL
No ratings yet
Linux Admin: Setup, LVM, NGINX, MySQL
27 pages
Rockman EXE 6 ModCard Cheat Guide
No ratings yet
Rockman EXE 6 ModCard Cheat Guide
26 pages
Understanding Data Structures Basics
No ratings yet
Understanding Data Structures Basics
1 page
Create Short Videos with AI Tools
No ratings yet
Create Short Videos with AI Tools
9 pages
Software Developer Job Description
No ratings yet
Software Developer Job Description
2 pages
Nintendo Lotcheck Guidelines Overview
No ratings yet
Nintendo Lotcheck Guidelines Overview
51 pages
Ground Equipment Updates for CRJ 900
No ratings yet
Ground Equipment Updates for CRJ 900
23 pages
ManualsLib Download Overview
No ratings yet
ManualsLib Download Overview
92 pages
E-Learning Platform with Admin Panel
No ratings yet
E-Learning Platform with Admin Panel
13 pages
Online Cabs Ordering System Overview
No ratings yet
Online Cabs Ordering System Overview
28 pages
API 18LCM: Life Cycle Management Overview
100% (1)
API 18LCM: Life Cycle Management Overview
10 pages
Understanding 3D Scanners and Technologies
No ratings yet
Understanding 3D Scanners and Technologies
13 pages
Blockchain Solutions for Healthcare Issues
No ratings yet
Blockchain Solutions for Healthcare Issues
22 pages
M2M GEKKO Specifications Sheet A4
No ratings yet
M2M GEKKO Specifications Sheet A4
4 pages
Partner Launch Kit For CMMI V2 0 SVC and SPM
No ratings yet
Partner Launch Kit For CMMI V2 0 SVC and SPM
8 pages

NumPy and Pandas Tutorial Guide

Uploaded by

NumPy and Pandas Tutorial Guide

Uploaded by

NumPy and Pandas for Data Analysis AI ML Training

To install NumPy, use the following command:

pip install numpy

# Create an array with zeros

# Create an array with ones

# Create an identity matrix

# Create an array with a range of values

# Create an array with evenly spaced values

LinkedIn: [Link]/in/nidhi-grover-raheja-904211138 1 |Pa ge

Indexing and Slicing

array = [Link]([1, 2, 3, 4, 5, 6])

array = [Link](1, 10)

LinkedIn: [Link]/in/nidhi-grover-raheja-904211138 2 |Pa ge

To install Pandas, use the following command:

pip install pandas

# Create a DataFrame from a dictionary

# Create a DataFrame from a CSV file

# Display the first few rows

# Display the last few rows

# Display the data types of columns

# Display the shape of the DataFrame

# Display summary statistics

# Select a single column

# Select multiple columns

# Select rows by index

# Select rows by label

LinkedIn: [Link]/in/nidhi-grover-raheja-904211138 3 |Pa ge

Adding and Dropping Columns

# Add a new column

Handling Missing Data

# Create a DataFrame with missing values

# Drop rows with missing values

# Fill missing values

Grouping and Aggregating Data

# Group by a column and calculate mean

# Group by multiple columns and calculate sum

# Create two DataFrames

LinkedIn: [Link]/in/nidhi-grover-raheja-904211138 4 |Pa ge

# Export DataFrame to CSV

# Export DataFrame to Excel

Advanced Pandas Tutorial

Creating Time Series Data

# Create a date range

# Create a DataFrame with time series data

Resampling Time Series Data

# Resample to weekly frequency and calculate the mean

# Resample to monthly frequency and calculate the sum

Working with Categorical Data

# Convert a column to categorical type

# Get the categories and codes

LinkedIn: [Link]/in/nidhi-grover-raheja-904211138 5 |Pa ge

# Create a pivot table

Handling Large Datasets

# Process each chunk

# Apply the function to each element

# Apply the function to each column

# Apply the function to each row

LinkedIn: [Link]/in/nidhi-grover-raheja-904211138 6 |Pa ge

# Calculate rolling mean

# Calculate expanding sum

# Calculate exponentially weighted mean

Handling JSON Data

# Read JSON string into DataFrame

LinkedIn: [Link]/in/nidhi-grover-raheja-904211138 7 |Pa ge

# Export DataFrame to JSON

Advanced Indexing with MultiIndex

# Accessing data in MultiIndex DataFrame

Combining DataFrames with concat and append

# Inefficient way: Using loops

# Efficient way: Using vectorized operations

LinkedIn: [Link]/in/nidhi-grover-raheja-904211138 8 |Pa ge

Common questions

What are the main differences between static array creation in NumPy and dynamic data entry manipulation in Pandas?

What are the main differences between static array creation in NumPy and dynamic data entry manipulation in Pandas?

How does grouping and aggregating in Pandas enhance data analysis compared to using only NumPy?

How does grouping and aggregating in Pandas enhance data analysis compared to using only NumPy?