0% found this document useful (0 votes)
40 views79 pages

NumPy Basics: Fast Array Operations

NumPy is a Python library designed for efficient mathematical and numerical operations, particularly with arrays and matrices. It outperforms Python lists due to its C-based implementation and offers powerful functions for scientific calculations, making it foundational for machine learning and deep learning libraries. The cheat sheet covers installation, array creation, manipulation, and essential functions for inspecting and sorting arrays.

Uploaded by

try.asmitt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views79 pages

NumPy Basics: Fast Array Operations

NumPy is a Python library designed for efficient mathematical and numerical operations, particularly with arrays and matrices. It outperforms Python lists due to its C-based implementation and offers powerful functions for scientific calculations, making it foundational for machine learning and deep learning libraries. The cheat sheet covers installation, array creation, manipulation, and essential functions for inspecting and sorting arrays.

Uploaded by

try.asmitt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

11/19/25, 12:49 PM NumPy_Cheat_Sheet

Q. What is NumPy?

NumPy (Numerical Python) is a Python library used for fast mathematical and numerical
operations.

It is mainly used to work with arrays, do matrix operations, and perform scientific
calculations efficiently.

Q. Why do we use NumPy?

1. Faster than Python lists – Because NumPy is written in C, operations are very fast.

2. Works with large datasets – Great for big numerical data.

3. Has powerful mathematical functions – Like sum, mean, standard deviation, dot
product.

4. Foundation for ML and DL – Libraries like Pandas, TensorFlow, PyTorch internally use
NumPy.

Key Features:
1. ndarray: Multi-dimensional array (1D, 2D, 3D…).

2. Vectorization: Do operations on entire arrays at once.

3. Broadcasting: Allows operations on arrays of different shapes.

4. Linear algebra: Matrix multiplication, transpose, inverse.

5. Statistical functions: Mean, variance, standard deviation, etc.

Q. Why NumPy is faster?

NumPy is faster because it uses optimized C code, stores data in continuous memory,
and performs vectorized operations without Python loops.

Installation of NumPy

C:\Users\Your Name>pip install numpy

In [3]: ##Import NumPy


import numpy as np

1. Creating Arrays Commands

In [4]: # create a NumPy array from a list


li = [1, 2, 3, 4]

[Link] 1/15
11/19/25, 12:49 PM NumPy_Cheat_Sheet

print([Link](li))

[1 2 3 4]

In [5]: # create a NumPy array from a tuple


tup = (5, 6, 7, 8)
print([Link](tup))

[5 6 7 8]

In [7]: # create a NumPy array from a list


list_1 = [1, 2, 3, 4]
list_2 = [5, 6, 7, 8]
list_3 = [9, 10, 11, 12]
print([Link]([list_1, list_2, list_3]))

[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]

In [8]: # create a NumPy array using [Link]()


print([Link]([4, 3], dtype=int))

[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]

2. Initial Placeholders

Example 1: For 1-Dimensional NumPy Arrays

In [9]: # create a NumPy array using [Link]()


print([Link](1, 10))

[1 2 3 4 5 6 7 8 9]

In [10]: # create a NumPy array using [Link]()


print([Link](1, 10, 3))

[ 1. 5.5 10. ]

In [11]: # create a NumPy array using [Link]()


print([Link](5, dtype=int))

[0 0 0 0 0]

In [12]: # create a NumPy array using [Link]()


print([Link](5, dtype=int))

[1 1 1 1 1]

In [13]: # create a NumPy array using [Link]()


print([Link](5))

[0.53633682 0.47931034 0.89136844 0.41353479 0.85860823]

In [14]: # create a NumPy array using [Link]()


print([Link](5, size=10))

[4 2 1 4 0 4 2 1 4 1]

Example 2: For N-dimensional Numpy Arrays

[Link] 2/15
11/19/25, 12:49 PM NumPy_Cheat_Sheet

In [15]: # create a NumPy array using [Link]()


print([Link]([4, 3], dtype = np.int32))

[[0 0 0]
[0 0 0]
[0 0 0]
[0 0 0]]

In [16]: # create a NumPy array using [Link]()


print([Link]([4, 3], dtype = np.int32))

[[1 1 1]
[1 1 1]
[1 1 1]
[1 1 1]]

In [17]: # create a NumPy array using [Link]()


print([Link]([2, 2], 67, dtype = int))

[[67 67]
[67 67]]

In [18]: # create a NumPy array using [Link]()


print([Link](4))

[[1. 0. 0. 0.]
[0. 1. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 0. 1.]]

3. Inspecting Properties

In [ ]: NumPy arrays possess some basic properties that can be used to get information
about the array such as the size, length, shape, and datatype of the array.
Numpy arrays can also be converted to a list and be change their datatype.

Example 1: One Dimensional Numpy Array

In [19]: arr = [Link]([1, 2, 3, 4])


# check size of the array
print("Size:", [Link])

Size: 4

In [20]: # check len of the array


print("len:", len(arr))

len: 4

In [21]: # check shape of the array


print("Shape:", [Link])

Shape: (4,)

In [22]: # check dtype of the array elements


print("Datatype:", [Link])

Datatype: int32

In [23]: # change the dtype to 'float64'


arr = [Link]('float64')

[Link] 3/15
11/19/25, 12:49 PM NumPy_Cheat_Sheet

print(arr)
print("Datatype:", [Link])

[1. 2. 3. 4.]
Datatype: float64

In [24]: # convert array to list


lis = [Link]()
print("\nList:", lis)
print(type(lis))

List: [1.0, 2.0, 3.0, 4.0]


<class 'list'>

Example 2: N-Dimensional Numpy Array

In [ ]: # Two dimensional numpy array


list_1 = [1, 2, 3, 4]
list_2 = [5, 6, 7, 8]
list_3 = [9, 10, 11, 12]
arr = [Link]([list_1, list_2, list_3])

# check size of the array


print("Size:", [Link])

# check length of the array


print("Length:", len(arr))

# check shape of the array


print("Shape:", [Link])

# check dtype of the array elements


print("Datatype:", [Link])

# change the dtype to 'float64'


arr = [Link]('float64')
print(arr)
print([Link])

# convert array to list


lis = [Link]()
print("\nList:", lis)
print(type(lis))

In [27]: # Two dimensional numpy array


list_1 = [1, 2, 3, 4]
list_2 = [5, 6, 7, 8]
list_3 = [9, 10, 11, 12]
arr = [Link]([list_1, list_2, list_3])

arr

Out[27]: array([[ 1, 2, 3, 4],


[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])

In [28]: # check size of the array


print("Size:", [Link])

Size: 12

[Link] 4/15
11/19/25, 12:49 PM NumPy_Cheat_Sheet

In [29]: # check length of the array


print("Length:", len(arr))

Length: 3

In [30]: # check shape of the array


print("Shape:", [Link])

Shape: (3, 4)

In [31]: # check dtype of the array elements


print("Datatype:", [Link])

Datatype: int32

In [32]: # change the dtype to 'float64'


arr = [Link]('float64')
print(arr)
print([Link])

[[ 1. 2. 3. 4.]
[ 5. 6. 7. 8.]
[ 9. 10. 11. 12.]]
float64

In [33]: # convert array to list


lis = [Link]()
print("\nList:", lis)
print(type(lis))

List: [[1.0, 2.0, 3.0, 4.0], [5.0, 6.0, 7.0, 8.0], [9.0, 10.0, 11.0, 12.0]]
<class 'list'>

3. Sorting Array

In [34]: # sorting a one dimensional


# numpy array using [Link]()
a = [Link]([12, 15, 10, 1])
print("Array before sorting",a)
[Link]()
print("Array after sorting",a)

Array before sorting [12 15 10 1]


Array after sorting [ 1 10 12 15]

In [35]: # sorting a two dimensional


# numpy array using [Link]()
# sort along the first axis

a = [Link]([[12, 15], [10, 1]])


arr1 = [Link](a, axis = 0)
print ("Along first axis : \n", arr1)

Along first axis :


[[10 1]
[12 15]]

In [36]: # sort along the last axis


a = [Link]([[10, 15], [12, 1]])
arr2 = [Link](a, axis = -1)
print ("\nAlong Last axis : \n", arr2)

[Link] 5/15
11/19/25, 12:49 PM NumPy_Cheat_Sheet

Along Last axis :


[[10 15]
[ 1 12]]

In [37]: a = [Link]([[12, 15], [10, 1]])


arr1 = [Link](a, axis = None)
print ("\nAlong none axis : \n", arr1)

Along none axis :


[ 1 10 12 15]

4. NumPy Array Manipulation


Appending Elements to Array, appending the new values at the end of the array using the append() function

In [38]: # Adding the values at the end


# of a numpy array
print("Original Array:", arr)

Original Array: [[ 1. 2. 3. 4.]


[ 5. 6. 7. 8.]
[ 9. 10. 11. 12.]]

In [39]: # appending to the array


arr = [Link](arr, [7])
print("Array after appending:", arr)

Array after appending: [ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 7.]

In [40]: # Adding the values at the end


# of a numpy array
arr = [Link](1, 13).reshape(2, 6)
print("Original Array")
print(arr, "\n")

Original Array
[[ 1 2 3 4 5 6]
[ 7 8 9 10 11 12]]

In [41]: # create another array which is


# to be appended column-wise
col = [Link](5, 11).reshape(1, 6)
arr_col = [Link](arr, col, axis=0)
print("Array after appending the values column wise")
print(arr_col, "\n")

Array after appending the values column wise


[[ 1 2 3 4 5 6]
[ 7 8 9 10 11 12]
[ 5 6 7 8 9 10]]

In [42]: # create an array which is


# to be appended row wise
row = [Link]([1, 2]).reshape(2, 1)
arr_row = [Link](arr, row, axis=1)
print("Array after appending the values row wise")
print(arr_row)

Array after appending the values row wise


[[ 1 2 3 4 5 6 1]
[ 7 8 9 10 11 12 2]]

[Link] 6/15
11/19/25, 12:49 PM NumPy_Cheat_Sheet

Inserting Elements into the Array


Numpy arrays can be manipulated by inserting them at a particular index using insert() function.

In [43]: arr = [Link]([1, 2, 3, 4])


# Python Program illustrating [Link]()
print("1D arr:", arr)

1D arr: [1 2 3 4]

In [44]: # Inserting value 9 at index 1


a = [Link](arr, 1, 9)
print("\nArray after insertion:", a)

Array after insertion: [1 9 2 3 4]

Removing Elements from Numpy Array

In [45]: # Python Program illustrating


# [Link]()
print("Original arr:", arr)

Original arr: [1 2 3 4]

In [46]: # deletion from 1D array


object = 2
a = [Link](arr, object)

Out[46]: array([1, 2, 4])

Reshaping Array

In [50]: # creating a numpy array


array = [Link]([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])

# printing array
print("Array: " + str(array))

Array: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16]

In [52]: # reshaping numpy array


# converting it to 2-D from 1-D array
reshaped1 = [Link]((4, [Link]//4))

print(reshaped1)

[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]
[13 14 15 16]]

In [53]: # creating another reshaped array


reshaped2 = [Link](array, (2, 8))
reshaped2

Out[53]: array([[ 1, 2, 3, 4, 5, 6, 7, 8],


[ 9, 10, 11, 12, 13, 14, 15, 16]])

[Link] 7/15
11/19/25, 12:49 PM NumPy_Cheat_Sheet

In [54]: # printing array


print(" 2-D Array:")
print(arr)

2-D Array:
[1 2 3 4]

In [59]: # reshaping numpy array


# converting it to 1-D from 2-D array
reshaped = [Link](-1) # -1 automatically flattens to 1-D

In [60]: # printing reshaped array


print("Reshaped 1-D Array:")
print(reshaped)

Reshaped 1-D Array:


[1 2 3 4]

Resizing an Array

In [61]: # Making a random array


arr = [Link]([1, 2, 3, 4, 5, 6])

# Required values 12, existing values 6


[Link](3, 4)
print(arr)

[[1 2 3 4]
[5 6 0 0]
[0 0 0 0]]

Flatten a Two Dimensional array


The flatten() function of Numpy module is used to convert a 2-dimensional array to a 1-
dimensional array.

In [62]: # Two dimensional numpy array


list_1 = [1, 2, 3, 4]
list_2 = [5, 6, 7, 8]
arr = [Link]([list_1, list_2])

print([Link]())

[1 2 3 4 5 6 7 8]

Transpose

In [63]: # making a 3x3 array


ncs = [Link]([[1, 2],
[4, 5],
[7, 8]])

# before transpose
print(ncs, end ='\n\n')

# after transpose
print([Link](1, 0))

[Link] 8/15
11/19/25, 12:49 PM NumPy_Cheat_Sheet

[[1 2]
[4 5]
[7 8]]

[[1 4 7]
[2 5 8]]

5. Combining and Splitting Commands in NumPy

5.1 Combining np array

[Link]()

In [64]: a = [Link]([[1, 2],


[3, 4]])
b = [Link]([[5, 6],
[7, 8]])

combined = [Link]((a, b), axis=0) # add rows


print(combined)

[[1 2]
[3 4]
[5 6]
[7 8]]

[Link]() — vertical stack

In [66]: v = [Link]((a, b))


v

Out[66]: array([[1, 2],


[3, 4],
[5, 6],
[7, 8]])

[Link]() — horizontal stack

In [67]: h = [Link]((a, b))


h

Out[67]: array([[1, 2, 5, 6],


[3, 4, 7, 8]])

5.2 Splitting arrays

[Link]()

In [70]: arr = [Link]([1,2,3,4,5,6])


x = [Link](arr, 3)
print(x)

[array([1, 2]), array([3, 4]), array([5, 6])]

[Link]() — vertical split

[Link] 9/15
11/19/25, 12:49 PM NumPy_Cheat_Sheet

In [71]: arr2 = [Link]([[1,2],


[3,4],
[5,6]])

vs = [Link](arr2, 3)

In [72]: arr2

Out[72]: array([[1, 2],


[3, 4],
[5, 6]])

[Link]() — horizontal split

In [82]: hs = [Link](arr2, 1)
hs

Out[82]: [array([[1, 2],


[3, 4],
[5, 6]])]

6. Indexing, Slicing and Subsetting

Subsetting

In [83]: arr = [Link]([5, 12, 20, 3, 18])


subset = arr[arr > 10]

print(subset)

[12 20 18]

In [84]: arr = [Link]([5, 12, 20, 3, 18])


subset = arr[(arr > 10) & (arr < 20)]

print(subset)

[12 18]

Slicing Array

array[start : end : step]

In [87]: arr = [Link]([10, 20, 30, 40, 50])

print(arr[1:4])

print(arr[:3])

print(arr[2:])

print(arr[0:5:2])

print(arr[::-1])

[Link] 10/15
11/19/25, 12:49 PM NumPy_Cheat_Sheet

[20 30 40]
[10 20 30]
[30 40 50]
[10 30 50]
[50 40 30 20 10]

In [88]: arr2 = [Link]([


[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
])

print(arr2[0:2, 1:3])

print(arr2[1, :])

print(arr2[:, 2])

[[2 3]
[5 6]]
[4 5 6]
[3 6 9]

Indexing

In [89]: arr = [Link]([10, 20, 30, 40])


print(arr[0])
print(arr[2])
print(arr[-1])

10
30
40

In [90]: arr2 = [Link]([


[5, 10, 15],
[20, 25, 30]
])

print(arr2[0, 1])
print(arr2[1, 2])
print(arr2[0])
print(arr2[:, 1])

10
30
[ 5 10 15]
[10 25]

7. Copying and Viewing Array

7.1. View (Shallow Copy)

A view means the new array points to the same data as the original array.

In [91]: arr = [Link]([10, 20, 30, 40])


view_arr = [Link]()

view_arr[1] = 99

[Link] 11/15
11/19/25, 12:49 PM NumPy_Cheat_Sheet

print(arr) # Original array changes


print(view_arr) # View array

[10 99 30 40]
[10 99 30 40]

7.2. Copy (Deep Copy)

A copy means NumPy creates a new array with new memory storage.

In [92]: arr = [Link]([10, 20, 30, 40])


copy_arr = [Link]()

copy_arr[1] = 99

print(arr) # Original does NOT change


print(copy_arr) # Only copy changes

[10 20 30 40]
[10 99 30 40]

How to Check If They Share Memory?

True → They share memory (view), False → No memory sharing (copy)

In [94]: np.shares_memory(arr, view_arr)

Out[94]: False

8. NumPy Array Mathematics

Arithmetic Operations

In [96]: # Defining both the matrices


a = [Link]([5, 72, 13, 100])
b = [Link]([2, 5, 10, 30])

# Performing addition using numpy function


print("Addition:", [Link](a, b))
print("*" * 60)

# Performing subtraction using numpy function


print("Subtraction:", [Link](a, b))
print("*" * 60)

# Performing multiplication using numpy function


print("Multiplication:", [Link](a, b))
print("*" * 60)

# Performing division using numpy functions


print("Division:", [Link](a, b))
print("*" * 60)

# Performing mod on two matrices


print("Mod:", [Link](a, b))
print("*" * 60)

[Link] 12/15
11/19/25, 12:49 PM NumPy_Cheat_Sheet

#Performing remainder on two matrices


print("Remainder:", [Link](a,b))
print("*" * 60)

# Performing power of two matrices


print("Power:", [Link](a, b))
print("*" * 60)

# Performing Exponentiation
print("Exponentiation:", [Link](b))

Addition: [ 7 77 23 130]
************************************************************
Subtraction: [ 3 67 3 70]
************************************************************
Multiplication: [ 10 360 130 3000]
************************************************************
Division: [ 2.5 14.4 1.3 3.33333333]
************************************************************
Mod: [ 1 2 3 10]
************************************************************
Remainder: [ 1 2 3 10]
************************************************************
Power: [ 25 1934917632 419538377 0]
************************************************************
Exponentiation: [7.38905610e+00 1.48413159e+02 2.20264658e+04 1.06864746e+13]

Comparison

In [97]: an_array = [Link]([[1, 2], [3, 4]])


another_array = [Link]([[1, 2], [3, 4]])

comparison = an_array == another_array


equal_arrays = [Link]()

print(equal_arrays)

True

Vector Math

In [98]: arr = [Link]([.5, 1.5, 2.5, 3.5, 4.5, 10.1])

# applying sqrt() method


print("Square-root:", [Link](arr))
print("*" * 60)

# applying log() method


print("Log Value: ", [Link](arr))
print("*" * 60)

# applying absolute() method


print("Absolute Value:", [Link](arr))
print("*" * 60)

# applying sin() method


print("Sine values:", [Link](arr))
print("*" * 60)

# applying ceil() method

[Link] 13/15
11/19/25, 12:49 PM NumPy_Cheat_Sheet

print("Ceil values:", [Link](arr))


print("*" * 60)

# applying floor() method


print("Floor Values:", [Link](arr))
print("*" * 60)

# applying round_() method


print ("Rounded values:", np.round_(arr))

Square-root: [0.70710678 1.22474487 1.58113883 1.87082869 2.12132034 3.17804972]


************************************************************
Log Value: [-0.69314718 0.40546511 0.91629073 1.25276297 1.5040774 2.31253
542]
************************************************************
Absolute Value: [ 0.5 1.5 2.5 3.5 4.5 10.1]
************************************************************
Sine values: [ 0.47942554 0.99749499 0.59847214 -0.35078323 -0.97753012 -0.6250
7065]
************************************************************
Ceil values: [ 1. 2. 3. 4. 5. 11.]
************************************************************
Floor Values: [ 0. 1. 2. 3. 4. 10.]
************************************************************
Rounded values: [ 0. 2. 2. 4. 4. 10.]

Statistic

In [100… # 1D array
arr = [20, 2, 7, 1, 34]

# mean
print("mean of arr:", [Link](arr))
print("*" * 60)

# median
print("median of arr:", [Link](arr))
print("*" * 60)

# sum
print("Sum of arr(uint8):", [Link](arr, dtype = np.uint8))
print("Sum of arr(float32):", [Link](arr, dtype = np.float32))
print("*" * 60)

# min and max


print("maximum element:", [Link](arr))
print("minimum element:", [Link](arr))
print("*" * 60)

# var
print("var of arr:", [Link](arr))
print("*" * 60)

# standard deviation
print("std of arr:", [Link](arr))

[Link] 14/15
11/19/25, 12:49 PM NumPy_Cheat_Sheet

mean of arr: 12.8


************************************************************
median of arr: 7.0
************************************************************
Sum of arr(uint8): 64
Sum of arr(float32): 64.0
************************************************************
maximum element: 34
minimum element: 1
************************************************************
var of arr: 158.16
************************************************************
std of arr: 12.576167937809991

corrcoef

In [101… # create numpy 1d-array


array1 = [Link]([0, 1, 2])
array2 = [Link]([3, 4, 5])

# pearson product-moment correlation


# coefficients of the arrays
rslt = [Link](array1, array2)

print(rslt)

[[1. 1.]
[1. 1.]]

In [ ]:

[Link] 15/15
11/24/25, 12:45 PM Pandas_cheat_sheet

What is Pandas?

In [ ]: >>Python's Pandas open-source package is a tool for data


analysis and management.
>>It was developed by Wes McKinney and is used in various fields,
including data science, finance, and social sciences.
>>Pandas' key features encompass the use of DataFrame and Series
objects, efficient indexing capabilities, data alignment, and
swift handling of missing data.

Data Structures in Pandas


Pandas provides two main data structures: Series and DataFrame.

1. Series: A one-dimensional labelled array capable of holding any data type.


2. DataFrame: A two-dimensional tabular data structure with labelled axes (rows and
columns).

Load the pandas libraries

In [2]: #Load the pandas librarie


import pandas as pd
# Print the Pandas version
print(pd.__version__)

2.1.4

Creating Pandas Series.

In [3]: # Create series with Pandas


series = [Link](data = ['Geeks','for','geeks'],
index = ['A','B','C'])
series

Out[3]: A Geeks
B for
C geeks
dtype: object

Create Pandas Dataframe

In [4]: data = {'Fruits': ['Mango', 'Apple', 'Banana', 'Orange'],


'Quantity': [40, 20, 25, 10],
'Price': [80, 100, 50, 70]
}

# Create Pandas Dataframe with dictionary


df = [Link](data)
print(df)

[Link] (1).html 1/16


11/24/25, 12:45 PM Pandas_cheat_sheet

Fruits Quantity Price


0 Mango 40 80
1 Apple 20 100
2 Banana 25 50
3 Orange 10 70

In [5]: # check Data types


[Link]

Out[5]: Fruits object


Quantity int64
Price int64
dtype: object

In [6]: # check the shape of dataset


[Link]

Out[6]: (4, 3)

In [7]: # check info


[Link]()

<class '[Link]'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Fruits 4 non-null object
1 Quantity 4 non-null int64
2 Price 4 non-null int64
dtypes: int64(2), object(1)
memory usage: 228.0+ bytes

In [8]: df['Quantity'] = df['Quantity'].astype('int32')


df['Fruits'] = df['Fruits'].astype('str')
df['Price'] = df['Price'].astype('float')

[Link]()

<class '[Link]'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Fruits 4 non-null object
1 Quantity 4 non-null int32
2 Price 4 non-null float64
dtypes: float64(1), int32(1), object(1)
memory usage: 212.0+ bytes

In [9]: [Link]

Out[9]: array([['Mango', 40, 80.0],


['Apple', 20, 100.0],
['Banana', 25, 50.0],
['Orange', 10, 70.0]], dtype=object)

In [10]: # Sorting in Ascending order


print(df.sort_values('Price', ascending=True))

[Link] (1).html 2/16


11/24/25, 12:45 PM Pandas_cheat_sheet

Fruits Quantity Price


2 Banana 25 50.0
3 Orange 10 70.0
0 Mango 40 80.0
1 Apple 20 100.0

In [11]: # Sorting in Descending order


print(df.sort_values('Price', ascending=False))

Fruits Quantity Price


1 Apple 20 100.0
0 Mango 40 80.0
3 Orange 10 70.0
2 Banana 25 50.0

In [12]: #### Sorting by Index


print(df.sort_index(ascending=False))

Fruits Quantity Price


3 Orange 10 70.0
2 Banana 25 50.0
1 Apple 20 100.0
0 Mango 40 80.0

In [13]: # Reset the indexes to default


# inplace = True will make changes to the orginal dataframe
# drop =True will drop the initial indexes
df.reset_index(drop=True, inplace=True)
print(df)

Fruits Quantity Price


0 Mango 40 80.0
1 Apple 20 100.0
2 Banana 25 50.0
3 Orange 10 70.0

In [14]: [Link](columns={'Fruits': 'FRUITS',


'Quantity': 'QUANTITY',
'Price': 'PRICE'},
inplace=True)
print(df)

FRUITS QUANTITY PRICE


0 Mango 40 80.0
1 Apple 20 100.0
2 Banana 25 50.0
3 Orange 10 70.0

Reshaping

In [15]: # Gather columns into rows.


print([Link](df))

[Link] (1).html 3/16


11/24/25, 12:45 PM Pandas_cheat_sheet

variable value
0 FRUITS Mango
1 FRUITS Apple
2 FRUITS Banana
3 FRUITS Orange
4 QUANTITY 40
5 QUANTITY 20
6 QUANTITY 25
7 QUANTITY 10
8 PRICE 80.0
9 PRICE 100.0
10 PRICE 50.0
11 PRICE 70.0

In [16]: # Drop the DISCOUNT Columns


df1 = [Link](columns=['QUANTITY'], axis=1)
print(df1)

FRUITS PRICE
0 Mango 80.0
1 Apple 100.0
2 Banana 50.0
3 Orange 70.0

In [17]: # Drop 2nd and 4th rows


df2 = [Link]([1, 3], axis=0)
print(df2)

FRUITS QUANTITY PRICE


0 Mango 40 80.0
2 Banana 25 50.0

Dataframe Slicing and Observation

In [18]: # Print first 5 rows


print([Link]())

FRUITS QUANTITY PRICE


0 Mango 40 80.0
1 Apple 20 100.0
2 Banana 25 50.0
3 Orange 10 70.0

In [19]: # Print Last 5 rows


print([Link]())

FRUITS QUANTITY PRICE


0 Mango 40 80.0
1 Apple 20 100.0
2 Banana 25 50.0
3 Orange 10 70.0

In [20]: # Randomly select n rows


print([Link](3))

FRUITS QUANTITY PRICE


3 Orange 10 70.0
1 Apple 20 100.0
0 Mango 40 80.0

[Link] (1).html 4/16


11/24/25, 12:45 PM Pandas_cheat_sheet

In [21]: # Select top 2 Highest QUANTITY


print([Link](2, 'QUANTITY'))

FRUITS QUANTITY PRICE


0 Mango 40 80.0
2 Banana 25 50.0

In [22]: # Select Least 2 QUANTITY


print([Link](2, 'QUANTITY'))

FRUITS QUANTITY PRICE


3 Orange 10 70.0
1 Apple 20 100.0

In [23]: # Select the price > 50


print(df[[Link] > 50])

FRUITS QUANTITY PRICE


0 Mango 40 80.0
1 Apple 20 100.0
3 Orange 10 70.0

In [24]: # Select the FRUITS name


print(df['FRUITS'])

0 Mango
1 Apple
2 Banana
3 Orange
Name: FRUITS, dtype: object

In [25]: # Select the FRUITS name and


# their corresponding PRICE
print(df[['FRUITS', 'PRICE']])

FRUITS PRICE
0 Mango 80.0
1 Apple 100.0
2 Banana 50.0
3 Orange 70.0

In [26]: # Select the columns whose names match


# the regular expression
print([Link](regex='F|Q'))

FRUITS QUANTITY
0 Mango 40
1 Apple 20
2 Banana 25
3 Orange 10

In [27]: # Select all the columns between Fruits and Price


print([Link][:, 'FRUITS':'PRICE'])

FRUITS QUANTITY PRICE


0 Mango 40 80.0
1 Apple 20 100.0
2 Banana 25 50.0
3 Orange 10 70.0

In [28]: # Select FRUITS name having PRICE <70


print([Link][df['PRICE'] < 70,

[Link] (1).html 5/16


11/24/25, 12:45 PM Pandas_cheat_sheet

['FRUITS', 'PRICE']])

FRUITS PRICE
2 Banana 50.0

In [29]: # Select 2:5 rows


print([Link][2:5])

FRUITS QUANTITY PRICE


2 Banana 25 50.0
3 Orange 10 70.0

In [30]: # Select the columns having ) 0th & 2nd positions


print([Link][:, [0, 2]])

FRUITS PRICE
0 Mango 80.0
1 Apple 100.0
2 Banana 50.0
3 Orange 70.0

In [31]: # Select Single PRICE value at 2nd Postion


[Link][1, 'PRICE']

Out[31]: 100.0

Filter

In [32]: print([Link](items=['FRUITS', 'PRICE']))

FRUITS PRICE
0 Mango 80.0
1 Apple 100.0
2 Banana 50.0
3 Orange 70.0

In [33]: # Filter by row index


print([Link](items=[3], axis=0))

FRUITS QUANTITY PRICE


3 Orange 10 70.0

Where

In [34]: df['PRICE'].where(df['PRICE'] > 50)

Out[34]: 0 80.0
1 100.0
2 NaN
3 70.0
Name: PRICE, dtype: float64

Query
Pandas query() methods return the filtered data frame.

In [37]: # QUERY
print([Link]('PRICE>70'))

[Link] (1).html 6/16


11/24/25, 12:45 PM Pandas_cheat_sheet

FRUITS QUANTITY PRICE


0 Mango 40 80.0
1 Apple 20 100.0

In [38]: # Price >50 & QUANTITY <30


print([Link]('PRICE>50 and QUANTITY<30'))

FRUITS QUANTITY PRICE


1 Apple 20 100.0
3 Orange 10 70.0

In [39]: # FRUITS name start with 'M'


print([Link]("[Link]('M')", ))

FRUITS QUANTITY PRICE


0 Mango 40 80.0

Combine Two data sets


In [40]: ## First Data Frame

df1 = [Link]({'Fruits': ['Mango', 'Banana',


'Grapes', 'Apple',
'Orange'],
'Price': [60, 40, 75, 100, 65]})
print(df1)

Fruits Price
0 Mango 60
1 Banana 40
2 Grapes 75
3 Apple 100
4 Orange 65

In [41]: ## Second Data Frame

df2 = [Link]({'Fruits': ['Apple', 'Orange',


'Papaya',
'Pineapple', 'Mango', ],
'Price': [120, 60, 30, 70, 50]})
print(df2)

Fruits Price
0 Apple 120
1 Orange 60
2 Papaya 30
3 Pineapple 70
4 Mango 50

Merge two dataframe

A. Left Join

In [42]: print([Link](df1, df2, how='left', on='Fruits'))

[Link] (1).html 7/16


11/24/25, 12:45 PM Pandas_cheat_sheet

Fruits Price_x Price_y


0 Mango 60 50.0
1 Banana 40 NaN
2 Grapes 75 NaN
3 Apple 100 120.0
4 Orange 65 60.0

B. Right Join

In [43]: print([Link](df1, df2, how='right', on='Fruits'))

Fruits Price_x Price_y


0 Apple 100.0 120
1 Orange 65.0 60
2 Papaya NaN 30
3 Pineapple NaN 70
4 Mango 60.0 50

C. Inner Join

In [44]: print([Link](df1, df2, how='inner', on='Fruits'))

Fruits Price_x Price_y


0 Mango 60 50
1 Apple 100 120
2 Orange 65 60

D. Outer Join

In [45]: print([Link](df1, df2, how='outer', on='Fruits'))

Fruits Price_x Price_y


0 Mango 60.0 50.0
1 Banana 40.0 NaN
2 Grapes 75.0 NaN
3 Apple 100.0 120.0
4 Orange 65.0 60.0
5 Papaya NaN 30.0
6 Pineapple NaN 70.0

Concatenation

A. Row-wise Concatenation having the same column name

In [46]: data = {'FRUITS': ['Grapes', 'Pineapple'],


'QUANTITY': [23, 17],
'PRICE': [60, 30]
}

# Create Pandas Dataframe with dictionary


df1 = [Link](data)

# Concatenate df and df1


df2 = [Link]([df, df1], axis=0,
ignore_index=True)
print(df2)

[Link] (1).html 8/16


11/24/25, 12:45 PM Pandas_cheat_sheet

FRUITS QUANTITY PRICE


0 Mango 40 80.0
1 Apple 20 100.0
2 Banana 25 50.0
3 Orange 10 70.0
4 Grapes 23 60.0
5 Pineapple 17 30.0

B. Column-wise Concatenation having the same column name

In [47]: data = {'DISCOUNT': [5, 7, 10, 8, 6]}

# Create Pandas Dataframe with dictionary


discount = [Link](data)

# Concatenate df2 and discount


df = [Link]([df2, discount], axis=1)
print(df)

FRUITS QUANTITY PRICE DISCOUNT


0 Mango 40 80.0 5.0
1 Apple 20 100.0 7.0
2 Banana 25 50.0 10.0
3 Orange 10 70.0 8.0
4 Grapes 23 60.0 6.0
5 Pineapple 17 30.0 NaN

Descriptive Analysis Pandas

Describe dataset

A. For numerical datatype

In [48]: print([Link]())

QUANTITY PRICE DISCOUNT


count 6.00000 6.000000 5.000000
mean 22.50000 65.000000 7.200000
std 10.05485 24.289916 1.923538
min 10.00000 30.000000 5.000000
25% 17.75000 52.500000 6.000000
50% 21.50000 65.000000 7.000000
75% 24.50000 77.500000 8.000000
max 40.00000 100.000000 10.000000

B. For object datatype

In [49]: print([Link](include=['O']))

FRUITS
count 6
unique 6
top Mango
freq 1

Unique values

[Link] (1).html 9/16


11/24/25, 12:45 PM Pandas_cheat_sheet

In [50]: # Check the unique values in the dataset


[Link]()

Out[50]: array(['Mango', 'Apple', 'Banana', 'Orange', 'Grapes', 'Pineapple'],


dtype=object)

In [51]: # Count the total unique values


[Link].value_counts()

Out[51]: FRUITS
Mango 1
Apple 1
Banana 1
Orange 1
Grapes 1
Pineapple 1
Name: count, dtype: int64

In [52]: #Sum Values

print(df['PRICE'].sum())

390.0

In [53]: # Cumulative Sum

print(df['PRICE'].cumsum())

0 80.0
1 180.0
2 230.0
3 300.0
4 360.0
5 390.0
Name: PRICE, dtype: float64

In [54]: # Minimumn PRICE


df['PRICE'].min()

Out[54]: 30.0

In [55]: # Maximum PRICE


df['PRICE'].max()

Out[55]: 100.0

In [56]: # Mean PRICE


df['PRICE'].mean()

Out[56]: 65.0

In [57]: # Median PRICE


df['PRICE'].median()

Out[57]: 65.0

In [58]: # Variance
df['PRICE'].var()

[Link] (1).html 10/16


11/24/25, 12:45 PM Pandas_cheat_sheet

Out[58]: 590.0

In [59]: # Stardard Deviation


df['PRICE'].std()

Out[59]: 24.289915602982237

In [60]: # Quantile
df['PRICE'].quantile([0, 0.25, 0.75, 1])

Out[60]: 0.00 30.0


0.25 52.5
0.75 77.5
1.00 100.0
Name: PRICE, dtype: float64

Covariance
In [61]: print([Link](numeric_only=True))

QUANTITY PRICE DISCOUNT


QUANTITY 101.1 53.0 -10.4
PRICE 53.0 590.0 -18.0
DISCOUNT -10.4 -18.0 3.7

Correlation
In [62]: print([Link](numeric_only=True))

QUANTITY PRICE DISCOUNT


QUANTITY 1.000000 0.217007 -0.499210
PRICE 0.217007 1.000000 -0.486486
DISCOUNT -0.499210 -0.486486 1.000000

Missing Values
Check for null values using isnull() function.

In [63]: # Check for null values


print([Link]())

FRUITS QUANTITY PRICE DISCOUNT


0 False False False False
1 False False False False
2 False False False False
3 False False False False
4 False False False False
5 False False False True

In [64]: # Total count of null values


print([Link]().sum())

FRUITS 0
QUANTITY 0
PRICE 0
DISCOUNT 1
dtype: int64

[Link] (1).html 11/16


11/24/25, 12:45 PM Pandas_cheat_sheet

In [65]: # Fill the null values with mean()

Mean = [Link]()

# Fill the null values


df['DISCOUNT'] = df['DISCOUNT'].fillna(Mean)
print(df)

FRUITS QUANTITY PRICE DISCOUNT


0 Mango 40 80.0 5.0
1 Apple 20 100.0 7.0
2 Banana 25 50.0 10.0
3 Orange 10 70.0 8.0
4 Grapes 23 60.0 6.0
5 Pineapple 17 30.0 7.2

In [66]: # Drop the null values


[Link](inplace=True)

Add a column to the Existing dataset


In [93]: # Values to add
Origin = [Link](data=['BH', 'J&K',
'BH', 'MP',
'WB', 'WB'])

# Add a column in dataset


df['Origin'] = Origin
print(df)

FRUITS QUANTITY PRICE DISCOUNT Origin TOTAL


0 Mango 40 80.0 5.0 BH 125.0
1 Apple 20 100.0 7.0 J&K 127.0
2 Banana 25 50.0 10.0 BH 85.0
3 Orange 10 70.0 8.0 MP 88.0
4 Grapes 23 60.0 6.0 WB 89.0
5 Pineapple 17 30.0 7.2 WB 54.2

Group By
Group the DataFrame by the 'Origin' column using groupby() methods

In [98]: numeric_cols = df.select_dtypes(include='number').columns

grouped = [Link]('Origin')
print(grouped[numeric_cols].agg(['sum', 'mean']))

QUANTITY PRICE DISCOUNT TOTAL


sum mean sum mean sum mean sum mean
Origin
BH 65 32.5 130.0 65.0 15.0 7.5 210.0 105.0
J&K 20 20.0 100.0 100.0 7.0 7.0 127.0 127.0
MP 10 10.0 70.0 70.0 8.0 8.0 88.0 88.0
WB 40 20.0 90.0 45.0 13.2 6.6 143.2 71.6

Outlier Detection using Box plot

[Link] (1).html 12/16


11/24/25, 12:45 PM Pandas_cheat_sheet

In [70]: # Box plot


[Link](column='PRICE', grid=False)

Out[70]: <Axes: >

Bar Plot with Pandas

In [81]: import [Link] as plt

[Link](x='FRUITS', y=['QUANTITY', 'PRICE', 'DISCOUNT'])


[Link]()

[Link] (1).html 13/16


11/24/25, 12:45 PM Pandas_cheat_sheet

Histogram
[Link]() methods is used to create a histogram.

In [85]: df['QUANTITY'].[Link](bins=3)
[Link]()

[Link] (1).html 14/16


11/24/25, 12:45 PM Pandas_cheat_sheet

Scatter Plot with Pandas


scatter() methods used to create a scatter plot in pandas.

In [88]: [Link](x='PRICE', y='DISCOUNT')


[Link]()

[Link] (1).html 15/16


11/24/25, 12:45 PM Pandas_cheat_sheet

Pie Chart
[Link]() methods used to create pie chart.

In [102… [Link]('Origin')['TOTAL'].sum().[Link](
autopct='%1.1f%%', # show percentages
startangle=90, # rotate pie for better view
figsize=(6,6), # size of plot
shadow=True, # add slight shadow
legend=True # show legend
)

[Link]('') # remove y-axis label


[Link]('TOTAL Distribution by Origin')
[Link]()

In [ ]:

[Link] (1).html 16/16


11/24/25, 5:42 PM session_1_matplotlib

📌 Matplotlib
#️⃣ 1. What is Matplotlib?
Matplotlib is a Python plotting library used to create static, animated, and interactive
visualizations. It works well with NumPy and Pandas, and is widely used in Data
Analysis and Machine Learning.

#️⃣ 2. Why Use Matplotlib? (Interview Points)


Easy to create line, bar, scatter, histogram, pie charts
Highly customizable visualizations
Integrates smoothly with Jupyter Notebook
Base library for Seaborn and other visualization tools
Supports both ✔️ Pyplot-style (simple) ✔️ Object-Oriented (OO) style
(professional)

#️⃣ 3. Basic Import


import [Link] as plt

Types of Data
1. Numerical Data
Data represented in numbers (e.g., age, salary, marks).
Used for mathematical calculations.

2. Categorical Data
Data divided into categories or labels (e.g., gender, city, department).
Used for grouping and classification.

In [2]: # import the library


import numpy as np
import pandas as pd
import [Link] as plt
import seaborn as sns

2D line plots

[Link] 1/41
11/24/25, 5:43 PM session_1_matplotlib

bivariate Analysis
categorical->numerical and numerical ->numerical
use case- Time series Data

📌 2D Line Plots
✔️ What is a 2D Line Plot?
A 2D line plot shows the relationship between two numerical variables using a
continuous line.

✔️ Why use it?


To visualize trends over time
To compare multiple trends
To show continuous data patterns

✔️ Basic Syntax
[Link](x, y)
[Link]()

| Parameter | Description |
| ------------------- | ------------------------------------------- |
| `x`, `y` | Data points |
| `color` / `c` | Line color |
| `linestyle` / `ls` | Type of line (`'-'`, `'--'`, `':'`, `'-.'`) |
| `linewidth` / `lw` | Line thickness |
| `marker` | Marker style (`'o'`, `s`, `*`, `^'`) |
| `markersize` / `ms` | Size of markers |
| `label` | For legend |
| `alpha` | Transparency level (0 to 1) |

[Link] 2/41
11/24/25, 5:43 PM session_1_matplotlib

In [5]: # plotting a simple function


price=[48000,54000,57000,49000,47000,45000]
year=[2015,2016,2017,2018,2019,2020]

[Link](year,price)

Out[5]: [<[Link].Line2D at 0x202bf6dad50>]

In [6]: # from a pandas dataframe


batsman=pd.read_csv('[Link]')
batsman

Out[6]: index RG Sharma V Kohli

0 2008 404 165

1 2009 362 246

2 2010 404 307

3 2011 372 557

4 2012 433 364

5 2013 538 639

6 2014 390 359

7 2015 482 505

8 2016 489 973

9 2017 333 308

In [7]: [Link](batsman['index'],batsman['V Kohli'])

[Link] 3/41
11/24/25, 5:43 PM session_1_matplotlib

Out[7]: [<[Link].Line2D at 0x202bef31d10>]

In [9]: # plotting multiple plots


[Link](batsman['index'],batsman['V Kohli'])
[Link](batsman['index'],batsman['RG Sharma'])

Out[9]: [<[Link].Line2D at 0x202c3237d90>]

In [10]: # label title


[Link](batsman['index'],batsman['V Kohli'])

[Link] 4/41
11/24/25, 5:43 PM session_1_matplotlib

[Link](batsman['index'],batsman['RG Sharma'])
[Link]('Rohit sharma VS virat kohli career comparison')

Out[10]: Text(0.5, 1.0, 'Rohit sharma VS virat kohli career comparison')

In [11]: [Link](batsman['index'],batsman['V Kohli'])


[Link](batsman['index'],batsman['RG Sharma'])
[Link]('Rohit sharma VS virat kohli career comparison')
[Link]('Season')
[Link]('runs scored')

Out[11]: Text(0, 0.5, 'runs scored')

[Link] 5/41
11/24/25, 5:43 PM session_1_matplotlib

In [12]: # colors (hex) and line (width and style) and marker size
[Link](batsman['index'],batsman['V Kohli'],color='green')
[Link](batsman['index'],batsman['RG Sharma'],color='red')
[Link]('Rohit sharma VS virat kohli career comparison')
[Link]('Season')
[Link]('runs scored')

Out[12]: Text(0, 0.5, 'runs scored')

[Link] 6/41
11/24/25, 5:43 PM session_1_matplotlib

In [16]: [Link](batsman['index'],batsman['V Kohli'],color='green',linestyle='solid')


[Link](batsman['index'],batsman['RG Sharma'],color='red',linestyle='dotted')
[Link]('Rohit sharma VS virat kohli career comparison')
[Link]('Season')
[Link]('runs scored')

Out[16]: Text(0, 0.5, 'runs scored')

[Link] 7/41
11/24/25, 5:43 PM session_1_matplotlib

In [17]: [Link](batsman['index'],batsman['V Kohli'],color='green',linestyle='solid',lin


[Link](batsman['index'],batsman['RG Sharma'],color='red',linestyle='dotted',li
[Link]('Rohit sharma VS virat kohli career comparison')
[Link]('Season')
[Link]('runs scored')

Out[17]: Text(0, 0.5, 'runs scored')

[Link] 8/41
11/24/25, 5:43 PM session_1_matplotlib

In [18]: [Link](batsman['index'],batsman['V Kohli'],color='green',linestyle='solid',lin


[Link](batsman['index'],batsman['RG Sharma'],color='red',linestyle='dotted',li
[Link]('Rohit sharma VS virat kohli career comparison')
[Link]('Season')
[Link]('runs scored')

Out[18]: Text(0, 0.5, 'runs scored')

[Link] 9/41
11/24/25, 5:43 PM session_1_matplotlib

In [22]: # legend->location->label
[Link](batsman['index'],batsman['V Kohli'],color='green',linestyle='solid',lin
[Link](batsman['index'],batsman['RG Sharma'],color='red',linestyle='dotted',li
[Link]('Rohit sharma VS virat kohli career comparison')
[Link]('Season')
[Link]('runs scored')
[Link]() # apne hissab se adjust kr skte hain jgah ko

Out[22]: <[Link] at 0x202c94c0910>

[Link] 10/41
11/24/25, 5:43 PM session_1_matplotlib

In [26]: # limiting axes #outlier


price=[48000,54000,57000,49000,47000,45000,450000]
year=[2015,2016,2017,2018,2019,2020,2021]

[Link](year,price)

Out[26]: [<[Link].Line2D at 0x202ca37d310>]

[Link] 11/41
11/24/25, 5:43 PM session_1_matplotlib

In [29]: price=[48000,54000,57000,49000,47000,45000,450000]
year=[2015,2016,2017,2018,2019,2020,2021]

[Link](year,price)
[Link](0,75000)
[Link](2017,2019)

Out[29]: (2017.0, 2019.0)

[Link] 12/41
11/24/25, 5:43 PM session_1_matplotlib

In [30]: # grid
[Link](batsman['index'],batsman['V Kohli'],color='green',linestyle='solid',lin
[Link](batsman['index'],batsman['RG Sharma'],color='red',linestyle='dotted',li
[Link]('Rohit sharma VS virat kohli career comparison')
[Link]('Season')
[Link]('runs scored')
[Link]()
[Link]()

In [31]: # show
[Link](batsman['index'],batsman['V Kohli'],color='green',linestyle='solid',lin
[Link](batsman['index'],batsman['RG Sharma'],color='red',linestyle='dotted',li
[Link]('Rohit sharma VS virat kohli career comparison')
[Link]('Season')
[Link]('runs scored')
[Link]()
[Link]()

[Link] 13/41
11/24/25, 5:43 PM session_1_matplotlib

Scatter Plots

Bivariate analysis
numerical vs numerical
use case- finding correlation

📌 Scatter Plot —
✅ What is a Scatter Plot?
[Link] 14/41
11/24/25, 5:43 PM session_1_matplotlib

A scatter plot displays individual data points on a 2D plane to show the relationship
between two numerical variables.
Used to identify correlation, patterns, clusters, and outliers.

✅ Why Use Scatter Plots?


Shows relationship between two variables
Useful for correlation analysis
Detects clusters and outliers
Helps in regression and trend understanding

🟦 Basic Syntax
[Link](x, y)
[Link]()

| Parameter | Description |
| ------------- | ----------------------------------------- |
| `x`, `y` | Data points |
| `color` / `c` | Point color |
| `s` | Marker size |
| `marker` | Shape of point (`o`, `s`, `^`, `*`, etc.) |
| `alpha` | Transparency (0 to 1) |
| `label` | Name for legend |
| `edgecolor` | Border color of marker |
| `linewidths` | Border thickness |

In [32]: # [Link] simple function


x=[Link](-10,10,50)
y=10*x+3 +[Link](0,300,50)
y

Out[32]: array([170. , 76.08163265, 3.16326531, -56.75510204,


44.32653061, 169.40816327, -62.51020408, 183.57142857,
86.65306122, -40.26530612, 199.81632653, 84.89795918,
244.97959184, 217.06122449, 110.14285714, 91.2244898 ,
43.30612245, 94.3877551 , 32.46938776, 104.55102041,
27.63265306, 96.71428571, 114.79591837, 225.87755102,
76.95918367, 88.04081633, 191.12244898, 101.20408163,
226.28571429, 56.36734694, 194.44897959, 51.53061224,
279.6122449 , 261.69387755, 296.7755102 , 123.85714286,
56.93877551, 118.02040816, 120.10204082, 222.18367347,
365.26530612, 138.34693878, 141.42857143, 116.51020408,
124.59183673, 172.67346939, 318.75510204, 351.83673469,
276.91836735, 114. ])

In [33]: [Link](x,y)

Out[33]: <[Link] at 0x202c4dee3c0>

[Link] 15/41
11/24/25, 5:43 PM session_1_matplotlib

In [35]: # [Link] on pandas data


df=pd.read_csv('[Link]')
df=[Link](50)
df

[Link] 16/41
11/24/25, 5:43 PM session_1_matplotlib

Out[35]: batter runs avg strike_rate

0 V Kohli 6634 36.251366 125.977972

1 S Dhawan 6244 34.882682 122.840842

2 DA Warner 5883 41.429577 136.401577

3 RG Sharma 5881 30.314433 126.964594

4 SK Raina 5536 32.374269 132.535312

5 AB de Villiers 5181 39.853846 148.580442

6 CH Gayle 4997 39.658730 142.121729

7 MS Dhoni 4978 39.196850 130.931089

8 RV Uthappa 4954 27.522222 126.152279

9 KD Karthik 4377 26.852761 129.267572

10 G Gambhir 4217 31.007353 119.665153

11 AT Rayudu 4190 28.896552 124.148148

12 AM Rahane 4074 30.863636 117.575758

13 KL Rahul 3895 46.927711 132.799182

14 SR Watson 3880 30.793651 134.163209

15 MK Pandey 3657 29.731707 117.739858

16 SV Samson 3526 29.140496 132.407060

17 KA Pollard 3437 28.404959 140.457703

18 F du Plessis 3403 34.373737 127.167414

19 YK Pathan 3222 29.290909 138.046272

20 BB McCullum 2882 27.711538 126.848592

21 RR Pant 2851 34.768293 142.550000

22 PA Patel 2848 22.603175 116.625717

23 JC Buttler 2832 39.333333 144.859335

24 SS Iyer 2780 31.235955 121.132898

25 Q de Kock 2767 31.804598 130.951254

26 Yuvraj Singh 2754 24.810811 124.784776

27 V Sehwag 2728 27.555556 148.827059

28 SA Yadav 2644 29.707865 134.009123

29 M Vijay 2619 25.930693 118.614130

30 RA Jadeja 2502 26.617021 122.108346

31 SPD Smith 2495 34.652778 124.812406

32 SE Marsh 2489 39.507937 130.109775

[Link] 17/41
11/24/25, 5:43 PM session_1_matplotlib

batter runs avg strike_rate

33 DA Miller 2455 36.102941 133.569097

34 JH Kallis 2427 28.552941 105.936272

35 WP Saha 2427 25.281250 124.397745

36 DR Smith 2385 28.392857 132.279534

37 MA Agarwal 2335 22.669903 129.506378

38 SR Tendulkar 2334 33.826087 114.187867

39 GJ Maxwell 2320 25.494505 147.676639

40 N Rana 2181 27.961538 130.053667

41 R Dravid 2174 28.233766 113.347237

42 KS Williamson 2105 36.293103 123.315759

43 AJ Finch 2092 24.904762 123.349057

44 AC Gilchrist 2069 27.223684 133.054662

45 AD Russell 2039 29.985294 168.234323

46 JP Duminy 2029 39.784314 120.773810

47 MEK Hussey 1977 38.764706 119.963592

48 HH Pandya 1972 29.878788 140.256046

49 Shubman Gill 1900 32.203390 122.186495

In [36]: [Link](df['avg'],df['strike_rate'],color='red',marker='+')
[Link]('Avg and SR analysis of top 50 batsman')
[Link]('Average')
[Link]('SR')

Out[36]: Text(0, 0.5, 'SR')

[Link] 18/41
11/24/25, 5:43 PM session_1_matplotlib

In [38]: # size
tips=sns.load_dataset('tips')

#slower
[Link](tips['total_bill'],tips['tip'],s=tips['size']*15)

Out[38]: <[Link] at 0x202c9cb0f50>

[Link] 19/41
11/24/25, 5:43 PM session_1_matplotlib

In [39]: # scatterplot using [Link]


#faster
[Link](tips['total_bill'],tips['tip'],'o')

Out[39]: [<[Link].Line2D at 0x202ca4a6990>]

Bar chart

[Link] 20/41
11/24/25, 5:43 PM session_1_matplotlib

bivariate analysis
categorical vs numerical
use case- Aggregate analysis of groups

📌 Bar Chart —
✅ What is a Bar Chart?
A bar chart represents categorical data using rectangular bars whose height/length
shows the value.
Used for comparing categories, groups, frequencies, or totals.

✅ Why Use Bar Charts?


Compare categories or groups
Visualize counts, totals, or averages
Easy to read
Works well with categorical data

[Link] 21/41
11/24/25, 5:43 PM session_1_matplotlib

🟦 Types of Bar Charts


1. Vertical Bar Chart → [Link]()
2. Horizontal Bar Chart → [Link]()
3. Grouped Bar Chart
4. **Stacked Bar Chart`

🟦 Basic Syntax
[Link](categories, values)
[Link]()

| Parameter | Description |
| ----------- | ------------------- |
| `x` | Categories (labels) |
| `height` | Values |
| `color` | Bar color |
| `width` | Bar width |
| `label` | Label for legend |
| `edgecolor` | Border color |
| `linewidth` | Border thickness |
| `alpha` | Transparency |

In [40]: # simple bar chart


children=[10,20,40,10,30]
colors=['red','blue','green','yellow','pink']
[Link](colors,children,color='blue')

Out[40]: <BarContainer object of 5 artists>

[Link] 22/41
11/24/25, 5:43 PM session_1_matplotlib

In [42]: # horizontal barchart


[Link](colors,children,color='pink')

Out[42]: <BarContainer object of 5 artists>

In [43]: # color and label


df=pd.read_csv('batsman_season_record.csv')
df

[Link] 23/41
11/24/25, 5:43 PM session_1_matplotlib

Out[43]: batsman 2015 2016 2017

0 AB de Villiers 513 687 216

1 DA Warner 562 848 641

2 MS Dhoni 372 284 290

3 RG Sharma 482 489 333

4 V Kohli 505 973 308

In [49]: [Link]([Link]([Link][0]) - 0.2,df['2015'],width=0.2,color='pink')

Out[49]: <BarContainer object of 5 artists>

In [52]: [Link]([Link]([Link][0]),df['2016'],width=0.2,color='blue')

Out[52]: <BarContainer object of 5 artists>

[Link] 24/41
11/24/25, 5:43 PM session_1_matplotlib

In [52]: [Link]([Link]([Link][0]),df['2016'],width=0.2,color='blue')

Out[52]: <BarContainer object of 5 artists>

In [53]: [Link]([Link]([Link][0]) + 0.2,df['2017'],width=0.2,color='black')

Out[53]: <BarContainer object of 5 artists>

[Link] 25/41
11/24/25, 5:43 PM session_1_matplotlib

In [54]: # multiple bar chart


[Link]([Link]([Link][0]) - 0.2,df['2015'],width=0.2,color='pink')
[Link]([Link]([Link][0]),df['2016'],width=0.2,color='blue')
[Link]([Link]([Link][0]) + 0.2,df['2017'],width=0.2,color='black')

Out[54]: <BarContainer object of 5 artists>

In [55]: # xticks
[Link]([Link]([Link][0]) - 0.2,df['2015'],width=0.2,color='pink')
[Link]([Link]([Link][0]),df['2016'],width=0.2,color='blue')

[Link] 26/41
11/24/25, 5:43 PM session_1_matplotlib

[Link]([Link]([Link][0]) + 0.2,df['2017'],width=0.2,color='black')

[Link]([Link]([Link][0]),df['batsman'])
[Link]()

In [60]: # a problem
children=[10,20,40,10,30]
colors=['red red red red red red red','blue blue blue blue blue','green greengre

[Link](colors,children,color='pink')

Out[60]: <BarContainer object of 5 artists>

[Link] 27/41
11/24/25, 5:43 PM session_1_matplotlib

In [59]: # solution
children=[10,20,40,10,30]
colors=['red red red red red red red','blue blue blue blue blue','green greengre
[Link](colors,children,color='pink')
[Link](rotation='vertical')
[Link]()

[Link] 28/41
11/24/25, 5:43 PM session_1_matplotlib

In [62]: # stacked Bar chart


[Link](df['batsman'],df['2017'],label='2017')
[Link](df['batsman'],df['2016'],bottom=df['2017'],label='2016')
[Link](df['batsman'],df['2015'],bottom=(df['2016']+df['2017']),label='2015')
[Link]()

Out[62]: <[Link] at 0x202ce7a0a50>

[Link] 29/41
11/24/25, 5:43 PM session_1_matplotlib

Histogram

univariate Analysis
Numerical col
use case- Frequency count

📌 Histogram —
✅ What is a Histogram?

[Link] 30/41
11/24/25, 5:43 PM session_1_matplotlib

A histogram is used to show the distribution of numerical (continuous) data by


grouping values into bins (intervals).

It tells:

How data is spread


Frequency of values
Patterns, skewness, and outliers

✅ Why Use Histograms?


Understand data distribution
Identify skewness, spread, peaks, and gaps
Used in statistics, ML, EDA (exploratory data analysis)
Helps find normal distribution patterns

🟦 Basic Syntax
[Link](data)
[Link]()
| Parameter | Description |
| ------------- | ------------------------------------------------- |
| `x` | Data |
| `bins` | Number of bins or list of intervals |
| `color` | Bar color |
| `edgecolor` | Border color |
| `alpha` | Transparency |
| `label` | Legend label |
| `density` | Normalize to probability density |
| `orientation` | Vertical / Horizontal |
| `histtype` | `'bar'`, `'barstacked'`, `'step'`, `'stepfilled'` |

In [63]: # simple data

data=[32,45,56,10,15,27,61]
[Link](data)

Out[63]: (array([2., 0., 0., 1., 1., 0., 1., 0., 0., 2.]),
array([10. , 15.1, 20.2, 25.3, 30.4, 35.5, 40.6, 45.7, 50.8, 55.9, 61. ]),
<BarContainer object of 10 artists>)

[Link] 31/41
11/24/25, 5:43 PM session_1_matplotlib

In [69]: data=[32,45,56,10,15,27,61]
[Link](data,bins=[10,25,40,55,70])

Out[69]: (array([2., 2., 1., 2.]),


array([10., 25., 40., 55., 70.]),
<BarContainer object of 4 artists>)

In [70]: # on some data


df=pd.read_csv('[Link]')
df

[Link] 32/41
11/24/25, 5:43 PM session_1_matplotlib

Out[70]: match_id batsman_runs

0 12 62

1 17 28

2 20 64

3 27 0

4 30 10

... ... ...

136 624 75

137 626 113

138 632 54

139 633 0

140 636 54

141 rows × 2 columns

In [71]: [Link](df['batsman_runs'],bins=[0,10,20,30,40,50,60,70,80,90,100,110,120])

Out[71]: (array([32., 29., 17., 21., 7., 14., 6., 7., 2., 2., 3., 1.]),
array([ 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., 100.,
110., 120.]),
<BarContainer object of 12 artists>)

In [72]: #logarithmic scale


arr=[Link]('[Link]')
[Link](arr,bins=[10,20,30,40,50,60,70])

[Link] 33/41
11/24/25, 5:43 PM session_1_matplotlib

Out[72]: (array([ 23., 200., 10000., 400., 1234., 92.]),


array([10., 20., 30., 40., 50., 60., 70.]),
<BarContainer object of 6 artists>)

In [73]: # use logarihtmic scale to show the 10 to 20


[Link](arr,bins=[10,20,30,40,50,60,70],log=True)

Out[73]: (array([ 23., 200., 10000., 400., 1234., 92.]),


array([10., 20., 30., 40., 50., 60., 70.]),
<BarContainer object of 6 artists>)

[Link] 34/41
11/24/25, 5:43 PM session_1_matplotlib

Pie Chart

univariate/Bivariate analysis
categorical vs numerical
use case - to find contribution on a standard scale

# 📌 Pie Chart — Complete Notes (Matplotlib)

## ✅ What is a Pie Chart?


A pie chart displays **categorical data** as slices of a circle where
**each slice represents a proportion or percentage** of the whole.

It helps show:
- Part-to-whole relationship
- Category-wise contribution
- Quick understanding of proportions

---

## ✅ Why Use Pie Charts?


- Best for **percentage distribution**
- Easy visual comparison of few categories
- Good for categorical summaries

---

# 🟦 Basic Syntax
```python
[Link](data)
[Link]()
```

---

# 🟦 Important Parameters of `[Link]()`

[Link] 35/41
11/24/25, 5:43 PM session_1_matplotlib

| Parameter | Description |
| ------------- | --------------------------------------------------- |
| `x` | Data values for slices |
| `labels` | Names of categories |
| `autopct` | Display percentage values on slices |
| `startangle` | Rotate the chart (0–360 degrees) |
| `explode` | Offset a slice to highlight |
| `colors` | List of colors |
| `shadow` | Add shadow effect |
| `radius` | Set pie chart size |
| `counterclock`| Draw slices clockwise or counterclockwise |
| `pctdistance` | Distance of % text from center |

---

# ✅ Summary

- Pie chart shows **percentage distribution**


- Useful for **categorical data**
- Customizable with **explode**, **colors**, **shadow**, **startangle**
- Donut chart = pie chart with a hole
- Best for 3–6 categories

In [75]: # simple dat


data=[23,45,100,20,49]
[Link](data)
[Link]()

In [76]: data=[23,45,100,20,49]
subjects=['eng','sciecnee','math','sst','hindi']
[Link](data,labels=subjects)
[Link]()

[Link] 36/41
11/24/25, 5:43 PM session_1_matplotlib

In [77]: # dataset
df=pd.read_csv('[Link]')
df

Out[77]: batsman batsman_runs

0 AB de Villiers 31

1 CH Gayle 175

2 R Rampaul 0

3 SS Tiwary 2

4 TM Dilshan 33

5 V Kohli 11

In [78]: # percentage
[Link](df['batsman_runs'],labels=df['batsman'],autopct='%0.1f%%')
[Link]()

[Link] 37/41
11/24/25, 5:43 PM session_1_matplotlib

In [79]: # colors
[Link](df['batsman_runs'],labels=df['batsman'],autopct='%0.1f%%',colors=['blue'
[Link]()

In [81]: #explode shadow


[Link](df['batsman_runs'],labels=df['batsman'],autopct='%0.1f%%',colors=['blue'
[Link]()

[Link] 38/41
11/24/25, 5:43 PM session_1_matplotlib

In [82]: [Link](df['batsman_runs'],labels=df['batsman'],autopct='%0.1f%%',colors=['blue'
[Link]()

changing styles
In [83]: [Link]

[Link] 39/41
11/24/25, 5:43 PM session_1_matplotlib

Out[83]: ['Solarize_Light2',
'_classic_test_patch',
'_mpl-gallery',
'_mpl-gallery-nogrid',
'bmh',
'classic',
'dark_background',
'fast',
'fivethirtyeight',
'ggplot',
'grayscale',
'petroff10',
'seaborn-v0_8',
'seaborn-v0_8-bright',
'seaborn-v0_8-colorblind',
'seaborn-v0_8-dark',
'seaborn-v0_8-dark-palette',
'seaborn-v0_8-darkgrid',
'seaborn-v0_8-deep',
'seaborn-v0_8-muted',
'seaborn-v0_8-notebook',
'seaborn-v0_8-paper',
'seaborn-v0_8-pastel',
'seaborn-v0_8-poster',
'seaborn-v0_8-talk',
'seaborn-v0_8-ticks',
'seaborn-v0_8-white',
'seaborn-v0_8-whitegrid',
'tableau-colorblind10']

In [84]: [Link]('dark_background')

save figure
In [85]: [Link](df['batsman_runs'],labels=df['batsman'],autopct='%0.1f%%',colors=['blue'

[Link]('[Link]')

[Link] 40/41
11/24/25, 5:43 PM session_1_matplotlib

In [ ]:

[Link] 41/41
1. Introduction to Tkinter
Tkinter is Python’s standard library for creating Graphical User Interfaces (GUIs).
It provides tools to design windows, buttons, labels, textboxes, menus, and other GUI elements
easily.

To Import Tkinter:
from tkinter import *

To Create a Basic Window:


from tkinter import *

root = Tk() # Create main window


[Link]("My First GUI") # Title of window
[Link]("400x300") # Set size of window
[Link]() # Keeps window open

Explanation:

 Tk() – creates the main window (root window).


 title() – sets the title of the window.
 geometry() – defines window size (width × height).
 mainloop() – runs the GUI until the user closes it.

2. Tkinter Widgets
A widget is a graphical element such as a button, label, or textbox.

Below are the most important widgets (with syntax + example + explanation).

(1) Label Widget

Used to display text or image.

from tkinter import *

root = Tk()
label = Label(root, text="Hello, Tkinter!", font=('Arial', 14), fg='blue')
[Link]()
[Link]()
Important Options:

 text → text to display


 fg → text color (foreground)
 bg → background color
 font → font style and size
 padx, pady → internal padding

(2) Entry Widget

Used to take single-line text input from user.

from tkinter import *

root = Tk()
entry = Entry(root, width=30, bg='lightyellow', fg='black')
[Link]()
[Link]()

Options:

 show='*' → for password input


 width → input box width

To get value:

data = [Link]()

(3) Button Widget

Used to create clickable buttons that perform actions.

from tkinter import *

def say_hello():
print("Hello!")

root = Tk()
btn = Button(root, text="Click Me", command=say_hello, bg='skyblue')
[Link]()
[Link]()

Options:

 text – button text


 command – function called when clicked
 fg, bg – colors
 font – font style

(4) Frame Widget

Used as a container to organize multiple widgets together.

from tkinter import *

root = Tk()
frame = Frame(root, bg='lightgray', borderwidth=3, relief='sunken')
[Link](padx=10, pady=10)

btn1 = Button(frame, text="Yes")


btn2 = Button(frame, text="No")
[Link](side=LEFT)
[Link](side=RIGHT)

[Link]()

(5) Text Widget

Used for multi-line text input.

from tkinter import *

root = Tk()
text = Text(root, height=5, width=40)
[Link]()
[Link]()

Get text from Textbox:

data = [Link]("1.0", END)

("1.0" means line 1, character 0)

(6) Checkbutton Widget

Used for checkboxes.

from tkinter import *

root = Tk()
var = IntVar()
check = Checkbutton(root, text="I agree", variable=var)
[Link]()
[Link]()

Get value:

print([Link]()) # 1 if checked, 0 if not

(7) Radiobutton Widget

Used to select one option from many.

from tkinter import *

root = Tk()
var = IntVar()
r1 = Radiobutton(root, text="Male", variable=var, value=1)
r2 = Radiobutton(root, text="Female", variable=var, value=2)
[Link]()
[Link]()
[Link]()

Get selected value:

print([Link]())

(8) Listbox Widget

Used to display a list of items.

from tkinter import *

root = Tk()
listbox = Listbox(root)
[Link](1, "Python")
[Link](2, "C++")
[Link](3, "Java")
[Link]()
[Link]()

(9) Menu Widget

Used to create dropdown menus.

from tkinter import *

root = Tk()
menu = Menu(root)
[Link](menu=menu)

file_menu = Menu(menu, tearoff=0)


menu.add_cascade(label="File", menu=file_menu)
file_menu.add_command(label="Open")
file_menu.add_command(label="Save")
file_menu.add_separator()
file_menu.add_command(label="Exit", command=[Link])

[Link]()

(10) Messagebox

Used to show pop-up messages.

from tkinter import *


from tkinter import messagebox

root = Tk()
def showmsg():
[Link]("Info", "Welcome to Tkinter!")

Button(root, text="Click", command=showmsg).pack()


[Link]()

(11) Scale Widget

Used to select numeric values using a slider.

from tkinter import *

root = Tk()
scale = Scale(root, from_=0, to=100, orient=HORIZONTAL)
[Link]()
[Link]()

Get value:

val = [Link]()

(12) Spinbox Widget

Used to select a number or string from a fixed range.

from tkinter import *

root = Tk()
spin = Spinbox(root, from_=0, to=10)
[Link]()
[Link]()

3. Geometry Managers
Tkinter provides three geometry managers for layout:

Method Description
.pack() Automatic placement in block format
.grid() Arranges widgets in rows and columns
.place() Uses exact x, y coordinates

Example:
[Link](row=0, column=0)
[Link](row=0, column=1)

4. Configuring the Main Window


[Link]("500x400+200+100") # width x height + xoffset + yoffset
[Link](False, False) # Disable resizing
[Link](bg='lightblue') # Background color

5. Some Useful Methods


Method Purpose
[Link]("Name") Set window title
[Link]() Close window
[Link](option=value) Change widget property
[Link]() Place widget
[Link]() Start event loop

6. Mini Project Example (Student Info Form)


from tkinter import *
from tkinter import messagebox

root = Tk()
[Link]("Student Form")
[Link]("400x300")
def submit():
[Link]("Info", f"Name: {[Link]()} | Age: {[Link]()} |
Gender: {[Link]()}")

Label(root, text="Student Information", font=('Arial', 16,


'bold')).pack(pady=10)

Label(root, text="Name:").pack()
name = Entry(root)
[Link]()

Label(root, text="Age:").pack()
age = Entry(root)
[Link]()

Label(root, text="Gender:").pack()
gender = StringVar(value="Male")
Radiobutton(root, text="Male", variable=gender, value="Male").pack()
Radiobutton(root, text="Female", variable=gender, value="Female").pack()

Button(root, text="Submit", bg="lightgreen", command=submit).pack(pady=10)

[Link]()

You might also like