⁕ Introduction to NumPy (U-4) ⁕
• 1. NumPy Basics
1. Introduction
NumPy (Numerical Python) is a Python library used for working with numbers and arrays.
It is widely used in scientific computing and data analysis.
2. Features of NumPy
Fast and efficient calculations
Uses less memory
Supports multi-dimensional arrays
Provides many mathematical functions
3. Installation and Import
Installation:
pip install numpy
Import:
import numpy as np
4. NumPy Array
A NumPy array is a collection of elements of the same type.
Example:
import numpy as np
arr = [Link]([1, 2, 3, 4])
5. Types of Arrays
(a) 1D Array
arr = [Link]([1, 2, 3])
(b) 2D Array
arr = [Link]([[1, 2], [3, 4]])
(c) 3D Array
arr = [Link]([[[1,2,3], [4,5,6]]])
6. Array Properties
ndim → number of dimensions
shape → size of array (rows, columns)
size → total number of elements
Example:
arr = [Link]([[1,2,3],[4,5,6]])
print([Link])
print([Link])
print([Link])
8. Basic Operations
Operations are performed element-wise.
a = [Link]([1,2,3])
b = [Link]([4,5,6])
print(a + b)
print(a - b)
print(a * b)
11. Mathematical Functions
arr = [Link]([1,2,3,4])
print([Link](arr))
print([Link](arr))
print([Link](arr))
print([Link](arr))
12. Advantages of NumPy
Faster than Python lists
Easy to use
Handles large data efficiently
13. Applications of NumPy
Data Science
Machine Learning
Data Analysis
Scientific calculations
• 1.1 creating Nd arrays
1. Nd Array (N-dimensional array)
An Nd array is the main data structure in NumPy used to store data in multiple dimensions.
👉 “N” means number of dimensions (1D, 2D, 3D, etc.)
👉 All elements are of the same data type
Examples:
2. Examples of Nd Arrays
1D Array → like a list
Example: [1, 2, 3]
2D Array → like a table (rows and columns)
Example:
[[1, 2],
[3, 4]]
3D Array → collection of 2D arrays
Example:
[[[1,2,3],
[4,5,6]]]
2. Creating Nd Arrays in Python
First import NumPy:
import numpy as np
(a) Using array()
arr = [Link]([1, 2, 3]) # 1D
arr2 = [Link]([[1,2],[3,4]]) # 2D
(b) Using zeros() and ones()
[Link]((2,3)) # array of 0s
[Link]((2,2)) # array of 1s
(c) Using arange()
[Link](1,10)
(d) Using random values
[Link](2,2)
3. Important Properties
arr = [Link]([[1,2,3],[4,5,6]])
[Link] # number of dimensions
[Link] # size of array
[Link] # total elements
2.2 Data types for Nd arrays
📊 Data Types for Nd Arrays (NumPy)
1. Introduction
In NumPy, an Nd array stores elements of the same type.
The data type (dtype) tells what kind of data is stored in the array (like integer, float, etc.).
2. What is dtype?
👉 dtype means data type of array elements
👉 It defines how data is stored in memory
Example:
import numpy as np
arr = [Link]([1, 2, 3], dtype='int')
3. Common Data Types in Nd Arrays
(a) Integer (int)
Used for whole numbers.
arr = [Link]([1, 2, 3], dtype='int')
Example values: 1, 10, -5
(b) Float
Used for decimal numbers.
arr = [Link]([1.5, 2.7, 3.0], dtype='float')
Example values: 1.2, 3.5, 10.0
(c) Boolean
Stores True or False values.
arr = [Link]([True, False, True], dtype='bool')
(d) String
Stores text values.
arr = [Link](['a', 'b', 'c'], dtype='str')
4. Checking Data Type
We can check the data type using dtype.
arr = [Link]([1, 2, 3])
print([Link])
5. Changing Data Type
We can convert one type to another.
arr = [Link]([1, 2, 3], dtype='float')
2.2 arithmetic with NumPy arrays
➕ Arithmetic with NumPy Arrays
1. Introduction
In NumPy, arithmetic operations are used to perform mathematical calculations on arrays.
👉 These operations work element by element.
👉 This means each element in one array is operated with the corresponding element in another array.
2. Basic Arithmetic Operations
(a) Addition (+)
Adds elements of two arrays.
import numpy as np
a = [Link]([1, 2, 3])
b = [Link]([4, 5, 6])
print(a + b)
Output:
[5 7 9]
(b) Subtraction (-)
Subtracts elements of two arrays.
print(a - b)
Output:
[-3 -3 -3]
(c) Multiplication (*)
Multiplies elements of two arrays.
print(a * b)
Output:
[4 10 18]
(d) Division (/)
Divides elements of two arrays.
print(a / b)
Output:
[0.25 0.4 0.5 ]
3. Scalar Operations
A single number can also operate on all elements of an array.
Example:
arr = [Link]([1, 2, 3])
print(arr + 2)
print(arr * 2)
Output:
[3 4 5]
[2 4 6]
2.3 basic indexing and slicing NumPy arrays
📊 Basic Indexing and Slicing in NumPy
1. Introduction
Indexing and slicing are used to access elements from a NumPy array.
👉 Indexing = getting a single element
👉 Slicing = getting multiple elements
2. Indexing in NumPy
Indexing means accessing a specific element using its position.
👉 Index starts from 0
Example (1D array):
import numpy as np
arr = [Link]([10, 20, 30, 40])
print(arr[0])
print(arr[2])
Output:
10
30
3. Indexing in 2D Array
In 2D arrays, we use row and column index.
Example:
arr = [Link]([[1, 2, 3],
[4, 5, 6]])
print(arr[0, 1])
print(arr[1, 2])
Output:
2
6
👉 Format: arr[row, column]
4. Slicing in NumPy
Slicing means getting a part of the array.
👉 Syntax: array[start:end]
👉 End index is not included
Example (1D array):
arr = [Link]([10, 20, 30, 40, 50])
print(arr[1:4])
Output:
[20 30 40]
5. Slicing with Step
We can also skip elements using step value.
print(arr[0:5:2])
Output:
[10 30 50]
6. Slicing in 2D Array
We can slice rows and columns.
arr = [Link]([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
print(arr[0:2, 1:3])
Output:
[[2 3]
[5 6]]
2.3 Boolean Indexing in NumPy
📊 Boolean Indexing in NumPy
1. Introduction
Boolean indexing is used to filter data from a NumPy array using conditions.
👉 It returns only those elements that satisfy a condition
👉 It uses True or False values
2. How Boolean Indexing Works
In Boolean indexing, we first apply a condition on an array.
NumPy then returns only the elements where the condition is True.
3. Example of Boolean Indexing
import numpy as np
arr = [Link]([10, 20, 30, 40, 50])
print(arr > 25)
Output:
[False False True True True]
👉 True means condition is satisfied
4. Filtering Elements
We can directly use the condition to get required values.
print(arr[arr > 25])
Output:
[30 40 50]
5. Multiple Conditions
We can also use multiple conditions using & (and) and | (or).
Example:
print(arr[(arr > 20) & (arr < 50)])
Output:
[30 40]
6. Important Points
Condition returns True/False values
Only True values are selected
Used for filtering data easily
Very useful in data analysis
2.3 Transposing Arrays and Swapping Axes in NumPy
🔄 Transposing Arrays and Swapping Axes in NumPy
1. Introduction
In NumPy, transpose and swap axes are used to change the shape and direction of arrays.
👉 They help in rearranging rows and columns
👉 Useful in matrix operations and data processing
2. Transposing Arrays
What is Transpose?
Transposing means converting rows into columns and columns into rows.
👉 Rows become columns
👉 Columns become rows
Example:
import numpy as np
arr = [Link]([[1, 2, 3],
[4, 5, 6]])
print(arr.T)
Output:
[[1 4]
[2 5]
[3 6]]
3. Using transpose() function
We can also use the function:
print([Link](arr))
4. Swapping Axes
What is Swap Axes?
Swapping axes means changing the position of dimensions in an array.
👉 Used in multi-dimensional arrays
👉 More flexible than transpose
Example:
arr = [Link]([[1, 2, 3],
[4, 5, 6]])
print([Link](arr, 0, 1))
Output:
[[1 4]
[2 5]
[3 6]]
5. Difference Between Transpose and Swapaxes
Transpose → mainly used for 2D arrays (rows ↔ columns)
Swapaxes → used for multi-dimensional arrays (any axes)
2.3 Universal Functions (Ufuncs) in NumPy
⚡ Universal Functions (Ufuncs) in NumPy
1. Introduction
Universal functions (Ufuncs) in NumPy are functions that work element by element on arrays.
👉 They perform operations on each element individually
👉 They are very fast compared to normal Python loops
2. Meaning of Ufunc
A Ufunc (Universal Function) is a function that applies the same operation to every element of an array.
Example: addition, subtraction, square, square root, etc.
3. Example of Ufunc
import numpy as np
arr = [Link]([1, 2, 3, 4])
print([Link](arr))
Output:
[ 1 4 9 16]
👉 Each element is squared individually
4. Common Universal Functions
(a) Addition
[Link](10, 20)
(b) Subtraction
[Link](20, 10)
(c) Multiplication
[Link](2, arr)
(d) Division
[Link](10, 2)
(e) Square Root
[Link](arr)
5. Key Feature
👉 Ufuncs work without loops
👉 They process all elements at once
👉 So they are fast and efficient
2.3 Introduction to Scikit-learn Library for Data Science
📊 Introduction to Scikit-learn Library for Data Science
1. Introduction
Scikit-learn is a Python library used for Machine Learning and Data Science.
👉 It provides tools to build models from data
👉 It helps in prediction, classification, and analysis
2. What is Scikit-learn?
Scikit-learn is an open-source library that gives simple tools for:
Data analysis
Machine learning models
Data preprocessing
3. Why Scikit-learn is used?
👉 It makes machine learning easy
👉 It works well with NumPy and Pandas
👉 It is fast and simple to use
👉 It provides ready-made algorithms
4. Main Uses of Scikit-learn
(a) Classification
Used to divide data into categories
Example: Spam or Not Spam emails
(b) Regression
Used to predict continuous values
Example: Predicting house prices
(c) Clustering
Used to group similar data
Example: Customer grouping
(d) Data Preprocessing
Used to clean and prepare data before training
5. Simple Example
from sklearn.linear_model import LinearRegression
model = LinearRegression()
👉 This creates a simple machine learning model
6. Key Feature
👉 Scikit-learn provides ready-to-use algorithms so we do not need to build everything from scratch.
⁕ Data Manipulation with Panda (U-5) ⁕
1. Introduction to Scikit-learn Library for Data Science
🐼 What is Pandas in Python
1. Introduction
Pandas is a Python library used for data handling and data analysis.
👉 It helps to work with large data easily
👉 It is mainly used in Data Science
2. What is Pandas?
Pandas is a tool that helps to store, organize, and analyze data in tabular form (like Excel sheets).
👉 It works with rows and columns
👉 It makes data easy to read and modify
3. Main Data Structures in Pandas
(a) Series
A Series is a one-dimensional data structure.
import pandas as pd
s = [Link]([10, 20, 30])
print(s)
(b) DataFrame
A DataFrame is a two-dimensional table (rows and columns).
data = {
"Name": ["A", "B", "C"],
"Marks": [80, 90, 85]
}
df = [Link](data)
print(df)
4. Why Pandas is used?
👉 Easy data handling
👉 Works like Excel tables
👉 Fast data analysis
👉 Helps in cleaning data
5. Simple Features
Reads data from Excel, CSV files
Filters and selects data easily
Handles missing data
Performs calculations
2.1 Series in Pandas
📊 Series in Pandas
1. Introduction
A Series is one of the main data structures in Pandas.
👉 It is a one-dimensional labeled array
👉 It can store any type of data like numbers, text, etc.
2. What is Series?
A Series is like a single column of a table.
👉 It has values and an index
👉 Index helps to access data easily
3. Creating a Series
import pandas as pd
s = [Link]([10, 20, 30, 40])
print(s)
4. Output Example
0 10
1 20
2 30
3 40
dtype: int64
👉 Left side = index
👉 Right side = values
5. Series with Custom Index
We can give our own index names.
s = [Link]([10, 20, 30], index=["a", "b", "c"])
print(s)
6. Accessing Elements in Series
print(s["b"])
👉 Output: 20
7. Key Features of Series
One-dimensional data
Has index and values
Can store different data types
Easy to access and modify data
2.2 DataFrame in Pandas
📊 DataFrame in Pandas
1. Introduction
A DataFrame is one of the most important data structures in Pandas.
👉 It is used to store data in table format (rows and columns)
👉 It is similar to an Excel sheet or database table
2. What is DataFrame?
A DataFrame is a 2-dimensional labeled data structure.
👉 It has rows and columns
👉 Each column can store different types of data
3. Creating a DataFrame
import pandas as pd
data = {
"Name": ["A", "B", "C"],
"Marks": [80, 90, 85]
}
df = [Link](data)
print(df)
4. Output Example
Name Marks
0 A 80
1 B 90
2 C 85
👉 Rows = records
👉 Columns = attributes
5. Accessing Data in DataFrame
(a) Column access
print(df["Name"])
(b) Row access
print([Link][0])
6. Key Features of DataFrame
Stores data in tabular form
Has rows and columns
Can handle large data
Easy to filter and analyze data
3.1 Dropping Entries in Pandas
🗑️Dropping Entries in Pandas
1. Introduction
Dropping entries means removing unwanted data from a DataFrame.
👉 It helps to clean data
👉 We can remove rows or columns
2. What is Dropping?
In Pandas, we use drop() function to delete data.
👉 It removes:
Rows
Columns
3. Dropping a Row
A row can be removed using its index number.
import pandas as pd
df = [Link]({
"Name": ["A", "B", "C"],
"Marks": [80, 90, 85]
})
df = [Link](1)
print(df)
👉 Row with index 1 is removed
4. Dropping a Column
To remove a column, we use axis=1.
df = [Link]("Marks", axis=1)
print(df)
👉 Column “Marks” is removed
5. Important Points
drop() is used to remove data
axis=0 → row removal
axis=1 → column removal
Original data is not changed unless reassigned
3.2 Indexing in Pandas
📌 Indexing in Pandas
1. Introduction
Indexing in Pandas means accessing data from rows and columns using labels or positions.
👉 It helps to get specific data from a DataFrame or Series easily.
2. What is Indexing?
Each row in Pandas has an index number or label.
👉 We use this index to access data
👉 It works like an address of data
3. Types of Indexing
(a) Column Indexing
Used to access a full column.
import pandas as pd
df = [Link]({
"Name": ["A", "B", "C"],
"Marks": [80, 90, 85]
})
print(df["Name"])
👉 Gives all values of Name column
(b) Row Indexing using loc
loc is used to access rows by label.
print([Link][0])
👉 Gives first row data
(c) Row Indexing using iloc
iloc is used to access rows by position.
print([Link][0])
👉 Gives first row data based on position
4. Simple Meaning
Indexing means finding and selecting data
It helps to access rows and columns easily
It is used for data analysis
3.3 Selection Function in Pandas
📌 Selection Function in Pandas
1. Introduction
Selection in Pandas means choosing specific data from a DataFrame or Series.
👉 It is used to access required rows or columns
👉 It helps to work with only needed data
2. What is Selection?
Selection is a process of extracting data based on column name or row position.
3. Types of Selection
(a) Column Selection
Used to select a single column from a DataFrame.
import pandas as pd
df = [Link]({
"Name": ["A", "B", "C"],
"Marks": [80, 90, 85]
})
print(df["Name"])
👉 This selects only the Name column
(b) Row Selection using (loc)
loc is used to select rows using labels (index).
print([Link][0])
👉 This selects first row
(c) Row Selection using (iloc)
iloc is used to select rows using position.
print([Link][0])
👉 This also selects first row based on position
4. Simple Meaning
Selection means choosing required data
It can select rows or columns
It helps in data analysis
3.4 Filtering Function in Pandas
📊 Filtering Function in Pandas
1. Introduction
Filtering in Pandas means selecting only those rows which satisfy a condition.
👉 It helps to remove unwanted data
👉 It shows only useful data
2. What is Filtering?
Filtering is a process of checking a condition and selecting matching data.
👉 If condition is True → data is selected
👉 If False → data is not selected
3. Example of Filtering
import pandas as pd
df = [Link]({
"Name": ["A", "B", "C"],
"Marks": [80, 90, 85]
})
print(df[df["Marks"] > 80])
4. Output
Name Marks
1 B 90
2 C 85
👉 Only rows with Marks greater than 80 are shown
5. Multiple Conditions
We can use more than one condition.
print(df[(df["Marks"] > 80) & (df["Marks"] < 90)])
👉 Selects marks between 80 and 90
6. Simple Meaning
Filtering means selecting data based on condition
Uses True/False logic
Helps to find required data
3.5 Applications in Pandas
📊 Application (apply) in Pandas
1. Introduction
Application in Pandas means applying a function to data.
👉 It is done using the apply() function
👉 Used to modify or process data easily
2. What is apply()?
apply() is used to apply a function to each value, row, or column in a DataFrame or Series.
3. Example on Series
import pandas as pd
s = [Link]([1, 2, 3])
s = [Link](lambda x: x * 2)
print(s)
👉 Each value is multiplied by 2
4. Example on DataFrame
df = [Link]({
"Marks": [80, 90, 85]
})
df["Marks"] = df["Marks"].apply(lambda x: x + 5)
print(df)
👉 Adds 5 to each value
3.6 Mapping in Pandas
📊 Mapping (map) in Pandas
1. Introduction
Mapping in Pandas means changing or transforming values in a Series.
👉 It is done using the map() function
👉 It works only on Series (one column)
2. What is map()?
map() is used to apply a function or replace values using a dictionary.
👉 It changes each value one by one
3. Example using Function
import pandas as pd
s = [Link]([1, 2, 3])
s = [Link](lambda x: x * 2)
print(s)
👉 Each value is multiplied by 2
4. Example using Dictionary
s = [Link](["A", "B", "C"])
s = [Link]({"A": "Apple", "B": "Ball", "C": "Cat"})
print(s)
👉 Values are replaced using dictionary
5. Important Points
Works only on Series
Applies function to each value
Can replace values using dictionary
3.6 sorting in Pandas
📊 Sorting in Pandas
1. Introduction
Sorting in Pandas means arranging data in a specific order.
👉 Data can be arranged in:
Ascending order (small to large)
Descending order (large to small)
2. What is Sorting?
Sorting helps to organize data properly so it becomes easy to read and analyze.
3. Sorting by Values
We use sort_values() to sort data based on column values.
Example:
import pandas as pd
df = [Link]({
"Name": ["A", "B", "C"],
"Marks": [80, 90, 85]
})
df = df.sort_values("Marks")
print(df)
👉 Sorts data by Marks in ascending order
4. Descending Order
df = df.sort_values("Marks", ascending=False)
print(df)
👉 Sorts data from highest to lowest
5. Sorting by Index
We use sort_index() to sort data by index.
df = df.sort_index()
6. Simple Meaning
Sorting = arranging data in order
Helps in better understanding of data
Can sort by values or index
3.6 Ranking in Pandas
📊 Ranking in Pandas
1. Introduction
Ranking in Pandas means assigning a position or rank to each value in a dataset.
👉 It tells which value is highest, second highest, and so on
👉 It is used for comparison of data
2. What is Ranking?
Ranking assigns a number (rank) to each value based on its size.
👉 Highest value gets rank 1 (by default)
👉 Lower values get higher rank numbers
3. Example of Ranking
import pandas as pd
s = [Link]([50, 80, 70, 90])
print([Link]())
4. Output Example
0 1.0
1 3.0
2 2.0
3 4.0
dtype: float64
👉 Each value gets a rank based on its position
5. Types of Ranking
(a) Ascending Rank (default)
Smallest value gets lowest rank.
(b) Descending Rank
print([Link](ascending=False))
👉 Highest value gets rank 1
6. Simple Meaning
Ranking means giving position to values
Used for comparison
Helps to find top or lowest values