Module–2: Python for Data Analytics
1. Introduction to Python Libraries for Data Analytics
Data analytics is the process of collecting, processing, analyzing, and interpreting data to extract
useful information for decision making.
Python is widely used for data analytics because it provides powerful libraries that simplify data
processing and analysis.
Two of the most commonly used libraries are:
• NumPy – Used for numerical and mathematical operations
• Pandas – Used for data manipulation and analysis
2. NumPy (Numerical Python)
NumPy is a Python library used for scientific and numerical computing. It provides support for
multi■dimensional arrays and many mathematical functions.
Features of NumPy:
• Efficient numerical computations
• Multi■dimensional array support
• Mathematical and statistical functions
• Faster execution compared to standard Python lists
Example: Creating a NumPy Array
import numpy as np
arr = [Link]([10, 20, 30, 40])
print(arr)
Output:
[10 20 30 40]
Example: 2D NumPy Array
import numpy as np
matrix = [Link]([[1,2,3],
[4,5,6]])
print(matrix)
Output:
[[1 2 3]
[4 5 6]]
Example: Mathematical Operation
import numpy as np
arr = [Link]([4, 9, 16])
print([Link](arr))
Output:
[2. 3. 4.]
3. Pandas Library
Pandas is a Python library used for data manipulation and analysis. It provides flexible data
structures for working with structured data. Main data structures: • Series – One dimensional
labeled array • DataFrame – Two dimensional table similar to an Excel sheet
Example: Creating a DataFrame
import pandas as pd
data = {
"Name":["Alice","Bob","Charlie"],
"Age":[25,30,35],
"City":["New York","Los Angeles","Chicago"]
}
df = [Link](data)
print(df)
Output:
Name Age City
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
4. Data Analysis Operations
Common operations used in data analysis include filtering, sorting and aggregation.
Filtering Example
older = df[df["Age"] > 30]
print(older)
Output:
Name Age City
2 Charlie 35 Chicago
Sorting Example
df.sort_values("Age", ascending=False)
Output:
Charlie 35
Bob 30
Alice 25
Aggregation Example
print(df["Age"].mean())
Output:
30
5. Data Importing
Pandas allows importing datasets from various sources such as CSV files, Excel files and
databases.
import pandas as pd
df = pd.read_csv("[Link]")
print(df)
6. Data Cleaning
Data cleaning involves detecting and correcting inaccurate or incomplete data. Common tasks
include handling missing values, removing duplicates and identifying outliers.
Handling Missing Values
[Link]()
[Link]()
[Link](0)
Removing Duplicates
df.drop_duplicates()
Handling Outliers
df = df[df["Age"] < 100]