Python Programming Lab Manual CS3107
Python Programming Lab Manual CS3107
(CS3107)
LABORATORY MANUAL & RECORD
VISION
To be a premier institute of knowledge of share quality research and development
technologies towards national buildings
MISSION
1. Develop the state of the art environment of high quality of learning
2. Collaborate with industries and academic towards training research innovation and
entrepreneurship
3. Create a platform of active participation co-circular and extra- circular activities
MISSION
To impart high standard value-based technical education in all aspects of Computer
Science and Engineering through the state of the art infrastructure and innovative
approach.
To produce ethical, motivated, and skilled engineers through theoretical knowledge
in a team.
To develop globally competent engineers with strong foundations, capable of “out of
the box” thinking so as to adapt to the rapidly changing scenarios requiring socially
conscious green computing solutions.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
PROGRAMME EDUCATIONAL OBJECTIVES (PEOs)
Graduates of B. Tech in Computer Science and Engineering Program shall be able to
PEO1: Strong foundation of knowledge and skills in the field of Computer Science and
Engineering.
PEO2: Provide solutions to challenging problems in their profession by applying computer
engineering theory and practices.
PEO3: Produce leadership and are effective in multidisciplinary environment.
PROGRAM OUTCOMES
1. Engineering Knowledge: Apply the knowledge of mathematics, science, engineering
fundamentals, and an engineering specialization to the solution of complex engineering
problems.
2. Problem Analysis: Identify, formulate, research literature, and analyze complex
engineering problems reaching substantiated conclusions using first principles of
mathematics, natural sciences, and engineering sciences.
3. Design/Development of Solutions: Design solutions for complex engineering
problems and design system components or processes that meet the specified needs with
appropriate consideration for the public health and safety, and the cultural, societal, and
environmental considerations.
4. Conduct Investigations of Complex Problems: Use research-based knowledge and
research methods including design of experiments, analysis and interpretation of data, and
synthesis of the information to provide valid conclusions.
5. Modern Tool Usage: Create, select, and apply appropriate techniques, resources, and
modern engineering and IT tools including prediction and modelling to complex
engineering activities with an understanding of the limitations.
6. The Engineer and Society: Apply reasoning informed by the contextual knowledge to
assess societal, health, safety, legal and cultural issues and the consequent responsibilities
relevant to the professional engineering practice.
7. Environment and Sustainability: Understand the impact of the professional
engineering solutions in societal and environmental contexts, and demonstrate the
knowledge of, and need for sustainable development.
8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities
and norms of the engineering practice.
9. Individual and Team Work: Function effectively as an individual, and as a member
or leader in diverse teams, and in multidisciplinary settings.
10. Communication: Communicate effectively on complex engineering activities with the
engineering community and with society at large, such as, being able to comprehend and
write effective reports and design documentation, make effective presentations, give and
receive clear instructions.
11. Project Management and Finance: Demonstrate knowledge and understanding of
the engineering and management principles and apply these to one’s own work, as a
member and leader in a team, to manage projects and in multidisciplinary environments.
12. Life-Long Learning: Recognize the need for, and have the preparation and ability to
engage in independent and life-long learning in the broadest context of technological
change.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
Course Objectives:
● familiarize students with key data structures in Python including lists and
dictionaries and apply them in context of searching, sorting, text and file
handling
● introduce students to calculation of statistical measures using Python
such as measures of central tendency, correlation
● familiarize students with important Python data related libraries such as
Numpy and Pandas and use them to manipulate arrays and data frames
● introduce students to data visualization in Python through creation of line
plots, histograms, scatter plots, box plots and others
● Implementation of basic machine learning tasks in Python including pre-
processing data, dimensionality reduction of data using PCA, clustering,
classification and cross-validation.
Course Outcomes:
Syllabus:
5. Python Programs for calculating Mean, Mode, Median, Variance, Standard Deviation
8. Python Programs for creation and manipulation of Data Frames using Pandas Library
o Adjusting the Plot: Line Colours and Styles, Axes Limits, Labelling Plots,
o Histograms,
o Boxplot
o Multiple Legends,
o Customizing Colorbars,
o Multiple Subplots,
o Customizing Ticks
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
Reference Books:
TABLE OF CONTENTS
Sl. Page
No CONTENT No
1 Write a Python program to perform operations on lists and dictionaries. 1-5
Write a Python program to implement searching and sorting algorithms.
2 6-13
Output
1
i+=1;
print(a[i+1])
Output
2
Output
Output
3
program to add key value pair into dictionary
d={'A':1, 'B':2, 'C':3}
str=input("Enter the key and value you want to enter separated by a whitespace :")
[Link]({[Link]()[0] : int([Link]()[1])})
print("updated dictionary is :",d)
Output
Output
4
del d[key]
print("updated dictionary is ",d)
else:
print("key not found")
Output
5
2. Python program on searching and sorting
Linear search function
def search(arr, N, x):
for i in range(0, N):
if arr[i] == x: # Fixed the comparison
return i
return -1 # Return -1 if element is not found
# Driver code
if name == ' main ':
arr = [2, 3, 4, 10, 40]
x = 10
N = len(arr)
result = search(arr, N, x)
if result == -1:
print("Element is not present")
else:
Output:
6
Binary Search
def binary_search(V, To_Find):
lo = 0
hi = len(V) - 1
if V[mid] == To_Find: # If the mid value is the one we are looking for
else:
print("Not found")
V = [1, 3, 4, 5, 6]
To_Find = 1
To_Find = 6
To_Find = 10
7
Output:
Bubble Sort
def bubbleSort(arr):
n = len(arr)
for i in range(n):
swapped = False # Flag to detect if any swap was made during this pass
for j in range(0, n - i - 1): # Traverse the array from 0 to n-i-1
if arr[j] > arr[j + 1]: # Swap if the element is greater than the next element
arr[j], arr[j + 1] = arr[j + 1], arr[j]
swapped = True
# If no two elements were swapped by inner loop, then the array is sorted
if not swapped:
break
Output:
8
Selection Sort
def selectionsort(array):
size = len(array) # Get the size of the array
for s in range(size):
min_idx = s
for i in range(s + 1, size):
if array[i] < array[min_idx]: # Find the smallest element
min_idx = i
# Swap the found minimum element with the first element of the unsorted
part
array[s], array[min_idx] = array[min_idx], array[s]
Output:
Insertion Sort
def insertionSort(arr):
arr[j + 1] = key
9
if name == ' main ':
arr = [12, 11, 13, 5, 6]
insertionSort(arr)
print("The sorted array is:", arr)
Output:
Quick Sort:
10
Output:
Merge Sort
def mergesort(arr):
if len(arr) > 1:
mid = len(arr) // 2
l = arr[:mid]
r = arr[mid:]
mergesort(l)
mergesort(r)
i=j=k=0
11
def printlist(arr):
for i in range(len(arr)):
print(arr[i], end=" ")
print()
Output:
Heap Sort
if largest != i:
arr[i], arr[largest] = arr[largest], arr[i]
12
heapify(arr, N, largest)
def heapsort(arr):
N = len(arr)
Output:
13
3. Python programs on text handling
Text handling
a = "Hello world good morning"
print(len(a))
print([Link]("l"))
print([Link]("H"))
print([Link]("l"))
print([Link]("l"))
print([Link]("l"))
print([Link]())
print([Link]())
print([Link]())
print([Link]())
Output
Checking in a string
text = "Python is a high-level, general-purpose programming language"
print("programming" in text)
14
print("programming" not in text)
if "programming" in text:
print('Yes,"programming" is present')
Output
Concatenation in strings
a = "good"
b = "morning"
c=a+""+b
print(c)
Output
Format strings
age = 18
txt = "My name is John, and I am {}"
15
print([Link](age))
quantity = 3
itemno = 567
price = 49.95
myorder = "I want {} pieces of item {} for {} dollars."
print([Link](quantity, itemno, price))
myorder = "I want to pay {2} dollars for {0} pieces of item {1}."
print([Link](quantity, itemno, price))
Output
Looping in strings
a = "Hello"
for x in a:
print(x)
for x in "Hello":
print(x)
for x in a:
print(x, end="")
16
Output
Modify strings
a = "hello world"
print([Link]())
b = "HelLO woRlD"
print([Link]())
print([Link]())
c = " Hello world! "
print([Link]())
print([Link]())
print([Link]())
d = "Hello World"
print([Link]("He", "J"))
e = "hello, world"
print([Link](","))
f = "hello world"
print([Link]())
print([Link]())
g = "Hello My Name Is PETER"
print([Link]())
17
Output
Slicing a string
string = "Hello World"
print(string[2:5])
print(string[:5])
print(string[2:])
print(string[-5:-2])
Output
Align a string
txt = "hello"
a = [Link](20)
print(a)
a = [Link](20, "0")
18
print(a)
print(len(a))
a = [Link](20, "*")
print(a, "world")
a = [Link](20, "*")
print(a)
Output
# The partition() method searches for a specified string, and splits the string into a
tuple containing three elements.
# The first element contains the part before the specified string.
# The second element contains the specified string.
# The third element contains the part after the string
19
Output
Output
20
Some other string functions
txt = "My name is Ståle"
x = [Link]()
print(x)
txt = 'h\te\tl\tl\t0'
x = [Link](1)
print(x)
txt = 'Demo'
print([Link]())
print([Link]())
txt = " "
print([Link]())
myTuple = ("John", "Peter", "Vicky")
x = "#".join(myTuple)
print(x)
txt = "50"
print([Link](10))
txt = "Thank you for the music\nWelcome to the jungle"
print([Link]())
Output
21
4. Python programs on file handling
Output
22
Output
23
Working of append() mode
file = open('[Link]','a')
[Link]('This will add this line')
[Link]()
Output
Output
with block
with open('[Link]') as file:
data = [Link]()
24
print(data)
The text in the file is
Output
25
5. Python programs for calculating Mean, Mode, Median,
Variance, Standard Deviation
Calculating mean
lst = [19,22,34,26,32,30,24,24]
def mean(dataset):
return sum(dataset)/len(dataset)
print(mean(lst))
Output
Calculating median
ls2 = [187,187,196,196,198,203,207,211,215]
ls2 = [187, 187, 196, 196, 198, 203, 207, 211, 215]
26
ls3 = [181, 187, 196, 198, 203, 207, 211, 215]
def median(dataset):
data = sorted(dataset)
index = len(data) // 2
if len(dataset) % 2 != 0:
return data[index]
return (data[index - 1] + data[index]) / 2
print(median(ls2))
print(median(ls3))
Output
Output
27
Calculating mode
ls2 = [3,15,23,42,30,10,10,12]
ls3 = ['nike','adidas','nike','jordan','jordan','rebook','under_amour','adidas']
def mode(dataset):
frequency = {}
for value in dataset:
frequency[value] = [Link](value,0)+1
most_frequent = max([Link]())
modes = [key for key,value in [Link]() if value == most_frequent]
return modes
print(mode(ls2))
print(mode(ls3))
Output
28
Output
Calculating variance
import statistics
print([Link]([1, 3, 5, 7, 9, 11]))
print([Link]([2, 2.5, 1.25, 3.1, 1.75, 2.8]))
print([Link]([-11, 5.5, -3.4, 7.1]))
print([Link]([1, 30, 50, 100]))
Output
29
Output
30
6. Python programs for Karl Pearson Coefficient of
Correlation and Rank Correlation
import numpy as np
x_simple = [Link]([-2, -1, 0, 1, 2])
y_simple = [Link]([4, 1, 3, 2, 0])
my_rho = [Link](x_simple, y_simple)
print(my_rho)
Output:
import pandas as pd
import [Link]
x = [15, 18, 21, 15, 21]
y = [25, 25, 27, 27, 27]
def spearmans_rank_correlation(x, y):
xranks = [Link](x).rank()
print("Rankings of X:")
print(xranks)
31
yranks = [Link](y).rank()
print("Rankings of Y:")
print(yranks)
# Use spearmanr instead of pearsonr
correlation, _ = [Link](xranks, yranks)
print("Spearman's Rank correlation:", correlation)
spearmans_rank_correlation(x, y)
Output:
Output:
32
7. Python programs on NumPy arrays and linear algebra
with NumPy
Python Numpy
Numpy is a general-purpose array-processing package. It provides a high-performance
multidimensional array object, and tools for working with these arrays. It is the
fundamental package for scientific computing with Python.
Besides its obvious scientific uses, Numpy can also be used as an efficient multi-
dimensional container of generic data.
NumPy is a general-purpose array-processing package.
It provides a high-performance multidimensional array object and tools for working
with these arrays.
It is the fundamental package for scientific computing with Python. It is open-source
software.
Features of NumPy
NumPy has various features which make them popular over lists.
Some of these important features include:
A powerful N-dimensional array object
Sophisticated (broadcasting) functions
Tools for integrating C/C++ and Fortran code
Useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, NumPy in Python can also be used as an efficient
multi-dimensional container of generic data.
Arbitrary data types can be defined using Numpy which allows NumPy to seamlessly
and speedily integrate with a wide variety of databases.
Arrays in Numpy
Array in Numpy is a table of elements (usually numbers), all of the same type, indexed
by a tuple of positive integers. In Numpy, number of dimensions of the array is called
rank of the array.A tuple of integers giving the size of the array along each dimension
is known as shape of the array. An array class in Numpy is called as ndarray. Elements
in Numpy arrays are accessed by using square brackets and can be initialized by using
nested Python Lists.
33
Program on numpy array
import numpy as np
arr = [Link]( [[ 1, 2, 3],
[ 4, 2, 5]] )
print("Array is of type: ", type(arr))
print("No. of dimensions: ", [Link])
print("Shape of array: ", [Link])
print("Size of array: ", [Link])
print("Array stores elements of type: ", [Link])
Output
34
print("Array with Rank 2: \n", arr)
arr = [Link]((1, 3, 2))
print("\nArray created using "
"passed tuple:\n", arr)
Output
35
print ("Array with first 2 rows and"
" alternate columns(0 and 2):\n", sliced_arr)
Index_arr = arr[[1, 1, 0, 3],
[3, 2, 1, 0]]
print ("\nElements at indices (1, 3), "
"(1, 2), (0, 1), (3, 0):\n", Index_arr)
Output
import numpy as np
a = [Link]([[1, 2],
[3, 4]])
b = [Link]([[4, 3],
[2, 1]])
print ("Adding 1 to every element:", a + 1)
36
print ("\nSubtracting 2 from each element:", b - 2)
print ("\nSum of all array "
"elements: ", [Link]())
print ("\nArray sum:\n", a + b)
Output
37
print ("An array initialized with all zeros:\n", c)
Output
38
f = [Link](0, 30, 5)
print ("A sequential array with steps of 5:\n", f)
Output
Output
39
Program on Reshaping Array using Reshape Method
import numpy as np
arr = [Link]([[1, 2, 3, 4],
[5, 2, 4, 2],
[1, 2, 0, 1]])
newarr = [Link](2, 2, 3)
print ("Original array:\n", arr)
print(" ")
print ("Reshaped array:\n", newarr)
Output
40
Integer array indexing: In this method, lists are passed for indexing for each
dimension. One-to-one mapping of corresponding elements is done to construct
a new arbitrary array.
Boolean array indexing: This method is used when we want to pick elements
from the array which satisfy some condition.
Output
41
NumPy – Unary Operators
Many unary operations are provided as a method of ndarray class. This includes sum,
min, max, etc. These functions can also be applied row-wise or column-wise by setting
an axis parameter.
Output
42
NumPy – Binary Operators
These operations apply to the array elementwise and a new array is created. You can
use all basic arithmetic operators like +, -, /, etc. In the case of +=, -=, = operators,
the existing array is modified.
Output
43
the array. The values of an ndarray are stored in a buffer which can be thought of as a
contiguous block of memory bytes which can be interpreted by the dtype object.
Numpy provides a large set of numeric datatypes that can be used to construct arrays.
At the time of Array creation, Numpy tries to guess a datatype, but functions that
construct arrays usually also include an optional argument to explicitly specify the
datatype.
Constructing a Datatype Object
In Numpy, datatypes of Arrays need not to be defined unless a specific datatype is
required. Numpy tries to guess the datatype for Arrays which are not predefined in the
constructor function.
Output
44
Math Operations on DataType array
In Numpy arrays, basic mathematical operations are performed element-wise on the
array. These operations are applied both as operator overloads and as functions. Many
useful functions are provided in Numpy for performing computations on Arrays such
as sum: for addition of Array elements, T: for Transpose of elements, etc.
Python Program to create a data type object
import numpy as np
arr1 = [Link]([[4, 7], [2, 6]],
dtype = np.float64)
# Second Array
arr2 = [Link]([[3, 6], [2, 8]],
dtype = np.float64)
Sum = [Link](arr1, arr2)
print("Addition of Two Arrays: ")
print(Sum)
Sum1 = [Link](arr1)
print("\nAddition of Array elements: ")
print(Sum1)
Sqrt = [Link](arr1)
print("\nSquare root of Array1 elements: ")
print(Sqrt)
Trans_arr = arr1.T
print("\nTranspose of Array: ")
print(Trans_arr)
45
Output
# Importing numpy as np
import numpy as np
A = [Link]([[6, 1, 1],
[4, -2, 5],
[2, 8, 7]])
46
print("Rank of A:", [Link].matrix_rank(A))
print("\nTrace of A:", [Link](A))
print("\nDeterminant of A:", [Link](A))
print("\nInverse of A:\n", [Link](A))
print("\nMatrix A raised to power 3:\n",
[Link].matrix_power(A, 3))
Output
47
Python Program to Matrix eigenvalues Functions
import numpy as np
from numpy import linalg as geek
a = [Link]([[1, -2j], [2j, 5]])
print("Array is:", a)
c, d = [Link](a)
print("Eigenvalues are:", c)
print("Eigenvectors are:", d)
Output
48
Output
49
8. Python Programs for creation and manipulation of
DataFrames using PandasLibrary
Creating DataFrame
DataFrame will be created by loading the datasets from existing storage, storage can
be SQL Database, CSV file, and Excel file. Pandas DataFrame can be created from
the lists, dictionary, and from a list of dictionary etc.
Output:
50
Adding data in DataFrame using Append Function
append function is used to add rows of other dataframes to end of existing dataframe,
returning a new dataframe object. Columns not in the original data frames are added
as new columns and the new cells are populated with NaN value.
Program
import pandas as pd
student_register = [Link]({
'Name': ['John', 'Alice'],
'Age': [20, 22],
'Student': [True, False]
})
new_person = [Link](['Mansi', 19, True], index=['Name', 'Age', 'Student'])
student_register = [Link]([student_register, new_person.to_frame().T],
ignore_index=True)
print(student_register)
Output:
51
print(' ')
# showing info about the data
print('Info: ')
print(student_register.info())
print(' ')
# correlation between columns
print('Correlation: ')
print(student_register.corr())
Output:
52
Output:
Output:
Output:
53
9. Write a Python program for the following.
Topics to be covered
Simple Line Plots
Adjusting the Plot: Line Colours and Styles, Axes Limits, Labelling Plots
Simple Scatter Plots
Histograms
Customizing Plot Legends
Choosing Elements for the Legend
Boxplot
Multiple Legends
Customizing Colorbars
Multiple Subplots
Text and Annotation
Customizing Ticks
x = [0, 1, 2, 3, 4]
y = [0, 1, 4, 9, 16]
[Link](x, y)
[Link]()
54
Output
Perhaps the simplest of all plots is the visualization of a single function y=f(x)y=f(x).
Here we will take a first look at creating a simple plot of this type. As with all the
following sections, we'll start by setting up the notebook for plotting and importing
the packages we will use:
For all Matplotlib plots, we start by creating a figure and an axes. In their simplest
form, a figure and axes can be created as follows:
fig = [Link]()
In[2] ax = [Link]()
In Matplotlib, the figure (an instance of the class [Link]) can be thought of as a
single container that contains all the objects representing axes, graphics, text, and
labels. The axes (an instance of the class [Link]) is what we see above: a bounding
box with ticks and labels, which will eventually contain the plot elements that make
up our visualization. Throughout this book, we'll commonly use the variable
55
name fig to refer to a figure instance, and ax to refer to an axes instance or group of
axes instances.
Once we have created an axes, we can use the [Link] function to plot some data. Let's
start with a simple sinusoid:
Output
Alternatively, we can use the pylab interface and let the figure and axes be created for us in
the background (see Two Interfaces for the Price of One for a discussion of these two
interfaces):
56
Output
If we want to create a single figure with multiple lines, we can simply call
the plot function multiple times:
[Link](x, [Link](x))
In[5] [Link](x, [Link](x))
Output
That's all there is to plotting simple functions in Matplotlib! We'll now dive into some
more details about how to control the appearance of the axes and lines.
57
Adjusting the Plot: Line Colors and Styles
The first adjustment you might wish to make to a plot is to control the line colors and
styles. The [Link]() function takes additional arguments that can be used to specify
these. To adjust the color, you can use the color keyword, which accepts a string
argument representing virtually any imaginable color. The color can be specified in a
variety of ways:
Output
Similarly, the line style can be adjusted using the linestyle keyword:
[Link](x, x + 0, linestyle='solid')
[Link](x, x + 1, linestyle='dashed')
In[7] [Link](x, x + 2, linestyle='dashdot')
[Link](x, x + 3, linestyle='dotted')
[Link](x, x + 4, linestyle='-')
[Link](x, x + 5, linestyle='--')
[Link](x, x + 6, linestyle='-.')
[Link](x, x + 7, linestyle=':')
58
Output
If you would like to be extremely terse, these linestyle and color codes can be
combined into a single non-keyword argument to the [Link]() function:
[Link](x, x + 0, '-g')
[Link](x, x + 1, '--c')
In[8] [Link](x, x + 2, '-.k')
[Link](x, x + 3, ':r')
Output
59
Adjusting the Plot: Axes Limits
Matplotlib does a decent job of choosing default axes limits for your plot, but
sometimes it's nice to have finer control. The most basic way to adjust axis limits is to
use the [Link]() and [Link]() methods:
Along with In[1], In[2], In[3] and In[4].
[Link](-1, 11)
In[9] [Link](-1.5, 1.5)
[Link]()
Output
If for some reason you'd like either axis to be displayed in reverse, you can simply
reverse the order of the arguments:
[Link](x, [Link](x))
In[10] [Link](10, 0)
[Link](1.2, -1.2);
60
Output
Output
61
The [Link]() method goes even beyond this, allowing you to do things like
automatically tighten the bounds around the current plot:
Output
It allows even higher-level specifications, such as ensuring an equal aspect ratio so that on
your screen, one unit in x is equal to one unit in y:
Output
62
Labeling Plots
As the last piece of this section, we'll briefly look at the labeling of plots: titles, axis
labels, and simple legends.
Titles and axis labels are the simplest such labels—there are methods that can be used
to quickly set them:
Along with In[1], In[2], In[3] and In[4].
[Link](x, [Link](x))
[Link]("A Sine Curve")
In[13] [Link]("x")
[Link]("sin(x)")
Output
The position, size, and style of these labels can be adjusted using optional arguments
to the function. For more information, see the Matplotlib documentation and the
docstrings of each of these functions.
When multiple lines are being shown within a single axes, it can be useful to create a
plot legend that labels each line type. Again, Matplotlib has a built-in way of quickly
creating such a legend. It is done via the (you guessed it) [Link]() method. Though
there are several valid ways of using this, I find it easiest to specify the label of each
line using the label keyword of the plot function:
[Link](x, [Link](x), '-g', label='sin(x)')
In[14] [Link](x, [Link](x), ':b', label='cos(x)')
[Link]('equal')
[Link]();
63
Output
As you can see, the [Link]() function keeps track of the line style and color, and
matches these with the correct label. More information on specifying and formatting
plot legends can be found in the [Link] docstring; additionally, we will cover some
more advanced legend options in Customizing Plot Legends.
[Link]() → ax.set_xlabel()
[Link]() → ax.set_ylabel()
[Link]() → ax.set_xlim()
[Link]() → ax.set_ylim()
[Link]() → ax.set_title()
64
ax = [Link]()
[Link](x, [Link](x))
In[16] [Link](xlim=(0, 10), ylim=(-2, 2),
xlabel='x', ylabel='sin(x)',
title='A Simple Plot');
Output
65
A python program on Scatter Plot
import [Link] as plt
x =[5, 7, 8, 7, 2, 17, 2, 9,
4, 11, 12, 9, 6]
y =[99, 86, 87, 88, 100, 86,
103, 87, 94, 78, 77, 85, 86]
[Link](x, y, c ="blue")
# To show the plot
[Link]()
Output
66
y1 = [21, 46, 3, 35, 67, 95,
53, 72, 58, 10]
x2 = [26, 29, 48, 64, 6, 5,
36, 66, 72, 40]
y2 = [26, 34, 90, 33, 38,
20, 56, 2, 47, 15]
[Link](x1, y1, c ="pink",
linewidths = 2,
marker ="s",
edgecolor ="green",
s = 50)
[Link](x2, y2, c ="yellow",
linewidths = 2,
marker ="^",
edgecolor ="red",
s = 200)
[Link]("X-axis")
[Link]("Y-axis")
[Link]()
Output
67
Bubble Plots in Matplotlib
This code generates a bubble chart using Matplotlib. It plots points with specified x
and y coordinates, each represented by a bubble with a size determined by the
bubble_sizes list. The chart has customization for transparency, edge color, and
linewidth. Finally, it displays the plot with a title and axis labels.
Output
68
Custom a Matplotlib Scatterplot
By importing Matplotlib we create a customized scatter plot using Matplotlib and
NumPy. It generates random data for x and y coordinates, colors, and sizes. The scatter
plot is then created with customized properties such as color, size, transparency, and
colormap. The plot includes a title, axis labels, and a color intensity scale. Finally, the
plot is displayed
Output
69
Histogram
A histogram is a graph showing frequency distributions.
It is a graph showing the number of observations within each given interval.
Example: Say you ask for the height of 250 people, you might end up with a
histogram like this:
You can read from the histogram that there are approximately:
2 people from 140 to 145cm
5 people from 145 to 150cm
15 people from 151 to 156cm
31 people from 157 to 162cm
46 people from 163 to 168cm
53 people from 168 to 173cm
45 people from 173 to 178cm
28 people from 179 to 184cm
21 people from 185 to 190cm
4 people from 190 to 195cm
70
A histogram in Python is a graphical representation of the distribution of a set of data.
It divides the data into bins (intervals), and for each bin, the histogram shows how
many data points fall into that interval. Histograms are useful for understanding the
distribution, frequency, and spread of numerical data.
In Python, you can create histograms using libraries like Matplotlib and Seaborn.
Key Components:
Distribution: It shows how data points are spread across different intervals (e.g.,
if they are normally distributed).
Identifying patterns: You can easily identify if the data follows any particular
shape like normal distribution, skewed, etc.
Outliers: Large spikes or gaps in the histogram can help detect outliers in your
data.
[Link](): This function creates the histogram. The bins argument defines the
number of intervals to divide the data into.
[Link](): This displays the histogram.
Create Histogram
In Matplotlib, we use the hist() function to create histograms.
The hist() function will use an array of numbers to create a histogram, the array is sent
into the function as an argument.
For simplicity we use NumPy to randomly generate an array with 250 values, where
the values will concentrate around 170, and the standard deviation is 10.
A python program to create Histogram
import numpy as np
x = [Link](170, 10, 250)
print(x)
71
Output
72
The hist() function will read the array and produce a histogram:
Example
A simple histogram:
import [Link] as plt
import numpy as np
x = [Link](170, 10, 250)
[Link](x)
[Link]()
Output
73
Customizing Plot Legends
Creating custom legends in Python is a common requirement when using plotting
libraries like Matplotlib. Legends provide important context to interpret a plot
effectively. Here’s a guide to composing custom legends:
74
[Link](handles=legend_elements, loc='upper left')
[Link]('Custom Legend Handles')
[Link]()
Output
75
Output
76
Output
77
Output
78
Choosing Elements for the Legend
When choosing elements for the legend in Python using Matplotlib, you can selectively
include or exclude specific elements, customize their appearance, or define their order
in the legend. Below are various approaches to control and refine the elements of the
legend.
1. Use label and [Link]()
Use the label parameter in plotting commands to name the items you want in the
legend.
To exclude elements from the legend, simply omit the label or set it as
_nolegend_.
Output
79
2. Include Only Specific Elements Using handles
You can explicitly define which elements should appear in the legend by using handles
and labels.
Output
80
Output
4. Using get_legend_handles_labels()
Retrieve all plot elements and their labels and customize the legend.
Output
81
Boxplot
Creating a boxplot in Python is straightforward using libraries like Matplotlib and Seaborn. A
boxplot (or box-and-whisker plot) is used to display the distribution of data based on a five-
number summary: minimum, first quartile (Q1), median, third quartile (Q3), and maximum.
Program
import [Link] as plt
import numpy as np
[Link]("Boxplot Example")
[Link]("Dataset")
[Link]("Values")
[Link]()
Output
82
2. Adding Customizations to a Matplotlib Boxplot
You can customize the appearance of a boxplot using the parameters of [Link]():
Program
[Link](
data,
patch_artist=True,
notch=True,
vert=True,
showmeans=True,
boxprops=dict(facecolor="lightblue", color="blue"),
whiskerprops=dict(color="red"),
capprops=dict(color="green"),
medianprops=dict(color="black")
[Link]("Customized Boxplot")
[Link]()
Output
83
3. Creating a Boxplot with Seaborn
Seaborn provides a simpler and more aesthetic way to create boxplots:
Program
import seaborn as sns
import pandas as pd
df = [Link]({
"Group": [Link](["A", "B", "C"], 100),
"Values": [Link](300)
})
[Link](x="Group", y="Values", data=df, palette="pastel")
[Link]("Boxplot with Seaborn")
[Link]()
Output
84
Program
[Link](x="Group", y="Values", data=df, palette="pastel")
[Link](x="Group", y="Values", data=df, color=".25", alpha=0.7)
[Link]("Boxplot with Swarmplot")
[Link]()
Output
5. Horizontal Boxplot
To display a horizontal boxplot, use the orient parameter in Seaborn or set vert=False
in Matplotlib.
Program
[Link](x="Values", y="Group", data=df, palette="coolwarm", orient="h")
[Link]("Horizontal Boxplot")
[Link]()
85
Output
86
Multiple Legends
Creating multiple legends in a single plot in Python can be achieved using Matplotlib.
By default, Matplotlib allows only one legend, but you can use ax.add_artist() to add
additional legends. Here's how to create multiple legends in different scenarios:
1. Adding Multiple Legends Manually
Program
import [Link] as plt
x = [1, 2, 3, 4, 5]
y1 = [1, 4, 9, 16, 25]
y2 = [1, 2, 3, 4, 5]
fig, ax = [Link]()
line1, = [Link](x, y1, label='Dataset 1', color='blue')
line2, = [Link](x, y2, label='Dataset 2', color='green')
legend1 = [Link](handles=[line1], loc='upper left', title="Legend 1")
ax.add_artist(legend1) # Add the first legend manually
[Link](handles=[line2], loc='lower right', title="Legend 2")
[Link]("Multiple Legends Example")
[Link]()
Output
87
Program
y3 = [25, 20, 15, 10, 5]
line3, = [Link](x, y3, label='Dataset 3', color='red')
legend1 = [Link](handles=[line1, line2], loc='upper center', title="Group 1")
ax.add_artist(legend1)
legend2 = [Link](handles=[line3], loc='lower center', title="Group 2")
ax.add_artist(legend2)
[Link]()
Output
Program
import numpy as np
fig, ax1 = [Link]()
x = [Link](0, 10, 100)
y1 = [Link](x)
line1, = [Link](x, y1, label="Sine Wave", color='blue')
ax2 = [Link]()
y2 = [Link](x)
line2, = [Link](x, y2, label="Cosine Wave", color='red')
legend1 = [Link](handles=[line1], loc='upper left', title="Axis 1")
88
ax1.add_artist(legend1)
legend2 = [Link](handles=[line2], loc='upper right', title="Axis 2")
ax2.add_artist(legend2)
[Link]("Multiple Legends with Twin Axes")
[Link]()
Output
Customized Colorbars
Basic continuous colorbar
import [Link] as plt
import matplotlib as mpl
import numpy as np # Importing numpy to create some dummy data
fig, ax = [Link](figsize=(6, 1), layout='constrained')
cmap = [Link]
norm = [Link](vmin=5, vmax=10)
dummy_data = [Link]([[5, 10]])
sm = [Link](cmap=cmap, norm=norm)
sm.set_array(dummy_data)
[Link](sm, cax=ax, orientation='horizontal', label='Some Units')
[Link]()
89
Output:
Output:
90
cmap = [Link]
bounds = [-1, 2, 5, 7, 12, 15]
norm = [Link](bounds, cmap.N, extend='both')
[Link]([Link](norm=norm, cmap=cmap),
cax=ax, orientation='horizontal',
label="Discrete intervals with extend='both' keyword")
[Link]()
Output:
Output:
91
Colorbar with custom extension lengths
import [Link] as plt
import matplotlib as mpl
fig, ax = [Link](figsize=(6, 1), constrained_layout=True)
cmap = ([Link](['royalblue', 'cyan', 'yellow', 'orange'])
.with_extremes(over='red', under='blue'))
bounds = [-1.0, -0.5, 0.0, 0.5, 1.0]
norm = [Link](bounds, cmap.N)
[Link](
[Link](cmap=cmap, norm=norm),
cax=ax, orientation='horizontal',
extend='both',
extendfrac='auto',
spacing='uniform',
label='Custom extension lengths, some other units',
)
[Link]()
Output:
Multiple Subplots
Creating multiple subplots using [Link]
import [Link] as plt
import numpy as np
x = [Link](0, 2 * [Link], 400)
y = [Link](x ** 2)
fig, ax = [Link]()
[Link](x, y)
ax.set_title('A single plot')
92
Output:
Output:
93
Horizontally Stacked Subplots
import [Link] as plt
import numpy as np
x = [Link](0, 10, 100)
y = [Link](x)
fig, (ax1, ax2) = [Link](1, 2)
[Link]('Horizontally stacked subplots')
[Link](x, y, color='blue', label='y = sin(x)')
ax1.set_title('First Plot')
[Link]()
[Link](x, -y, color='red', label='y = -sin(x)')
ax2.set_title('Second Plot')
[Link]()
plt.tight_layout(rect=[0, 0, 1, 0.95])
[Link]()
Output:
94
x = [Link](0, 10, 100)
y = [Link](x)
fig, axs = [Link](2, 2)
[Link]('2x2 Grid of Subplots')
axs[0, 0].plot(x, y)
axs[0, 0].set_title('Axis [0, 0]')
axs[0, 1].plot(x, y, 'tab:orange')
axs[0, 1].set_title('Axis [0, 1]')
axs[1, 0].plot(x, -y, 'tab:green')
axs[1, 0].set_title('Axis [1, 0]')
axs[1, 1].plot(x, -y, 'tab:red')
axs[1, 1].set_title('Axis [1, 1]')
for ax in [Link]:
[Link](xlabel='x-label', ylabel='y-label')
for ax in [Link]:
ax.label_outer()
plt.tight_layout(rect=[0, 0, 1, 0.95])
[Link]()
Output:
95
Sharing axes
import [Link] as plt
import numpy as np
x = [Link](0, 10, 100)
y = [Link](x)
fig, (ax1, ax2) = [Link](2)
[Link]('Axes values are scaled individually by default')
[Link](x, y, label='y = sin(x)', color='blue')
[Link]()
ax1.set_title('First Plot')
[Link](x + 1, -y, label='y = -sin(x), x shifted', color='red')
[Link]()
ax2.set_title('Second Plot')
ax1.set_xlabel('x-axis')
ax1.set_ylabel('y-axis')
ax2.set_xlabel('x-axis')
ax2.set_ylabel('y-axis')
plt.tight_layout(rect=[0, 0, 1, 0.95])
[Link]()
Output:
96
Sharing both axes
import [Link] as plt
import numpy as np
x = [Link](0, 10, 100)
y = [Link](x)
fig, axs = [Link](3, sharex=True, sharey=True)
[Link]('Sharing both axes')
axs[0].plot(x, y ** 2, label='y^2', color='blue')
axs[0].legend()
axs[0].set_title('First Plot')
axs[1].plot(x, 0.3 * y, 'o', label='0.3*y', color='orange')
axs[1].legend()
axs[1].set_title('Second Plot')
axs[2].plot(x, y, '+', label='y', color='green')
axs[2].legend()
axs[2].set_title('Third Plot')
for ax in axs:
ax.set_ylabel('y-label')
axs[-1].set_xlabel('x-label')
plt.tight_layout(rect=[0, 0, 1, 0.95])
[Link]()
Output:
97
Sharing x per column, y per row
import [Link] as plt
import numpy as np
x = [Link](0, 10, 100)
y = [Link](x)
fig = [Link]()
gs = fig.add_gridspec(2, 2, hspace=0, wspace=0)
(ax1, ax2), (ax3, ax4) = [Link](sharex='col', sharey='row')
[Link]('Sharing x per column, y per row')
[Link](x, y, label='y = sin(x)', color='blue')
[Link]()
ax1.set_title('Top Left')
[Link](x, y**2, 'tab:orange', label='y^2')
[Link]()
ax2.set_title('Top Right')
[Link](x + 1, -y, 'tab:green', label='y = -sin(x)')
[Link]()
ax3.set_title('Bottom Left')
[Link](x + 2, -y**2, 'tab:red', label='-y^2')
[Link]()
ax4.set_title('Bottom Right')
for ax in fig.get_axes():
ax.label_outer()
[Link]()
Output:
98
Polar Axes
import [Link] as plt
import numpy as np
x = [Link](0, 2 * [Link], 100)
y = [Link](2 * x)
fig, (ax1, ax2) = [Link](1, 2, subplot_kw=dict(projection='polar'))
[Link]('Polar Subplots Example')
[Link](x, y, label='y = sin(2x)', color='blue')
ax1.set_title('First Polar Plot')
[Link]()
[Link](x, y ** 2, label='y^2 = sin^2(2x)', color='orange')
ax2.set_title('Second Polar Plot')
[Link]().
Output:
Customizing Ticks
Axis ticks:
Axis ticks refer to the markers on the axes that indicate specific data values. These
ticks, along with their labels, provide context for the plot's scale.
99
import [Link] as plt
import [Link] as ticker
fig, axs = [Link](2, 1, figsize=(5.4, 5.4), layout='constrained')
x = [Link](100)
for nn, ax in enumerate(axs):
[Link](x, x, label='y = x')
[Link]()
if nn == 1:
ax.set_title('Manual ticks')
ax.set_yticks([Link](0, 100.1, 100/3))
xticks = [Link](0.50, 101, 20)
xlabels = [f'\\${x:1.2f}' for x in xticks] ax.set_xticks(xticks, labels=xlabels)
else:
ax.set_title('Automatic ticks')
[Link]()
Output:
100
ax.set_title('Manual ticks')
ax.set_yticks([Link](0, 100.1, 100/3))
ax.set_yticks([Link](0, 100.1, 100/30), minor=True)
[Link](True, which='minor', linestyle=':', linewidth=0.5)
else:
ax.set_title('Automatic ticks')
[Link]()
Output:
Locators Demonstration
import numpy as np
import [Link] as plt
import [Link] as ticker
def setup(ax, title):
"""Set up common parameters for the Axes in the example."""
[Link].set_major_locator([Link]())
[Link][['left', 'right', 'top']].set_visible(False)
[Link].set_ticks_position('bottom')
ax.tick_params(which='major', width=1.00, length=5)
ax.tick_params(which='minor', width=0.75, length=2.5)
101
ax.set_xlim(0, 5)
ax.set_ylim(0, 1)
[Link](0.0, 0.2, title, transform=[Link],
fontsize=14, fontname='Monospace', color='tab:blue')
fig, axs = [Link](8, 1, layout='constrained')
# Null Locator
setup(axs[0], title="NullLocator()")
axs[0].xaxis.set_major_locator([Link]())
axs[0].xaxis.set_minor_locator([Link]())
# Multiple Locator
setup(axs[1], title="MultipleLocator(0.5)")
axs[1].xaxis.set_major_locator([Link](0.5))
axs[1].xaxis.set_minor_locator([Link](0.1))
# Fixed Locator
setup(axs[2], title="FixedLocator([0, 1, 5])")
axs[2].xaxis.set_major_locator([Link]([0, 1, 5]))
axs[2].xaxis.set_minor_locator([Link]([Link](0.2, 0.8, 4)))
# Linear Locator
setup(axs[3], title="LinearLocator(numticks=3)")
axs[3].xaxis.set_major_locator([Link](3))
axs[3].xaxis.set_minor_locator([Link](31))
# Index Locator
setup(axs[4], title="IndexLocator(base=0.5, offset=0.25)")
axs[4].plot(range(0, 5), [0]*5, color='white')
axs[4].xaxis.set_major_locator([Link](base=0.5, offset=0.25))
# Auto Locator
setup(axs[5], title="AutoLocator()")
axs[5].xaxis.set_major_locator([Link]())
axs[5].xaxis.set_minor_locator([Link]())
# MaxN Locator
setup(axs[6], title="MaxNLocator(n=4)")
axs[6].xaxis.set_major_locator([Link](4))
axs[6].xaxis.set_minor_locator([Link](40))
# Log Locator
setup(axs[7], title="LogLocator(base=10, numticks=15)")
axs[7].set_xlim(10**3, 10**10)
axs[7].set_xscale('log')
axs[7].xaxis.set_major_locator([Link](base=10, numticks=15))
[Link]()
102
Output:
Tick Formatting
import [Link] as plt
import [Link] as ticker
def setup(ax, title):
"""Set up common parameters for the Axes in the example."""
[Link].set_major_locator([Link]())
[Link][['left', 'right', 'top']].set_visible(False)
[Link].set_major_locator([Link](1.00))
[Link].set_minor_locator([Link](0.25))
[Link].set_ticks_position('bottom')
ax.tick_params(which='major', width=1.00, length=5)
ax.tick_params(which='minor', width=0.75, length=2.5, labelsize=10)
ax.set_xlim(0, 5)
ax.set_ylim(0, 1)
[Link](0.0, 0.2, title, transform=[Link],
fontsize=14, fontname='Monospace', color='tab:blue')
fig = [Link](figsize=(8, 8), constrained_layout=True)
fig0, fig1, fig2 = [Link](3, height_ratios=[1.5, 1.5, 7.5])
103
[Link]('String Formatting', fontsize=16, x=0, ha='left')
ax0 = [Link]()
setup(ax0, title="'{x} km'")
[Link].set_major_formatter([Link]("{x} km"))
[Link]('Function Formatting', fontsize=16, x=0, ha='left')
ax1 = [Link]()
setup(ax1, title="def(x, pos): return str(x-5)")
[Link].set_major_formatter(lambda x, pos: str(x-5))
[Link]('Formatter Object Formatting', fontsize=16, x=0, ha='left')
axs2 = [Link](7, 1)
setup(axs2[0], title="NullFormatter()")
axs2[0].xaxis.set_major_formatter([Link]())
setup(axs2[1], title="StrMethodFormatter('{x:.3f}')")
axs2[1].xaxis.set_major_formatter([Link]("{x:.3f}"))
setup(axs2[2], title="FormatStrFormatter('#%d')")
axs2[2].xaxis.set_major_formatter([Link]("#%d"))
def fmt_two_digits(x, pos):
return f'[{x:.2f}]'
setup(axs2[3], title='FuncFormatter("[{:.2f}]".format)')
axs2[3].xaxis.set_major_formatter([Link](fmt_two_digits))
setup(axs2[4], title="FixedFormatter(['A', 'B', 'C', 'D', 'E', 'F'])")
positions = [0, 1, 2, 3, 4, 5]
labels = ['A', 'B', 'C', 'D', 'E', 'F']
axs2[4].xaxis.set_major_locator([Link](positions))
axs2[4].xaxis.set_major_formatter([Link](labels))
setup(axs2[5], title="ScalarFormatter()")
axs2[5].xaxis.set_major_formatter([Link](useMathText=True))
setup(axs2[6], title="PercentFormatter(xmax=5)")
axs2[6].xaxis.set_major_formatter([Link](xmax=5))
[Link]()
104
Output:
105
whether the ticks are drawn at the bottom, top, left, or right of the Axes.
It also can control the tick labels:
labelsize (fontsize)
labelcolor (color of the label)
labelrotation
labelbottom, labeltop, labelleft, labelright
import [Link] as plt
import numpy as np
fig, axs = [Link](1, 2, figsize=(6.4, 3.2))
for nn, ax in enumerate(axs):
[Link]([Link](100))
if nn == 1:
[Link](True)
ax.tick_params(right=True, left=False, axis='y', color='r', length=16,
grid_color='none')
ax.tick_params(axis='x', color='m', length=4, direction='in', width=4,
labelcolor='g')
plt.tight_layout()
[Link]()
Output:
106
Text And Annotations
Text in Matplotlib
Matplotlib has extensive text support, including support for mathematical expressions,
truetype support for raster and vector outputs, newline separated text with arbitrary
rotations, and Unicode support.
107
10. Python Programs for Data preprocessing: Handling missing
values, handling categorical data, bringing features to same
scale, selecting meaningful features.
Importing the libraries:
# libraries
import numpy as np # used for handling numbers
import pandas as pd # used for handling the dataset
from [Link] import SimpleImputer # used for handling missing datafrom
[Link]
import LabelEncoder, OneHotEncoder # used for encoding categorical datafrom
sklearn.model_selection
import train_test_split # used for splitting training and testing datafrom
[Link]
import StandardScaler # used for feature scaling
If you select and run the above code in Spyder, you should see a similar output in your
IPython console.
import numpy as np # used for handling numbers
...: import pandas as pd # used for handling the dataset
...: from [Link] import SimpleImputer # used for handling missing data
108
...: from [Link] import LabelEncoder, OneHotEncoder # used for
encoding categorical data
...: from sklearn.model_selection import train_test_split # used for splitting training
and testing data
...: from [Link] import StandardScaler # used for feature scaling
If you see any import errors, try to install those packages explicitly using pip
command as follows.
pip install <package-name>
In order to import this dataset into our script, we are apparently going to use pandas as
follows.
dataset = pd.read_csv('[Link]') # to import the dataset into a variable# Splitting the
attributes into independent and dependent attributes
109
X = [Link][:, :-1].values # attributes to determine dependent variable / Class Y =
[Link][:, -1].values # dependent variable / Class
When you run this code section, you should not see any errors, if you do make sure
the script and the [Link] are in the same folder. When successfully executed, you
can move to variable explorer in the Spyder UI and you will see the following three
variables.
When you double click on each of these variables, you should see something similar.
110
If you face any errors in order to see these data variables, try to upgrade Spyder to
Spyder version 4.
111
Here you can see, that the missing values have been replaced by the average values of
the respective columns.
112
X = onehotencoder.fit_transform(X).toarray()labelencoder_Y = LabelEncoder() Y =
labelencoder_Y.fit_transform(Y)
After execution of this code, the independent variable X and dependent variable Y will
transform into the following.
Here, you can see that the Region variable is now made up of a 3 bit binary variable.
The left most bit represents India, 2nd bit represents Brazil and the last bit represents
USA. If the bit is 1 then it represents data for that country otherwise not. For Online
Shopper variable, 1 represents Yes and 0 represents No.
Here, we are taking training set to be 80% of the original data set and testing set to be
20% of the original data set. This is usually the ratio in which they are split. But, you
can come across sometimes to a 70–30% or 75–25% ratio split. But, you don’t want to
split it 50–50%. This can lead to Model Overfitting. This topic is too huge to be
113
covered in the same post. I will cover it in some future post. For now, we are going to
split it in 80–20% ratio.
After split, our training set and testing set look like this.
Feature Scaling
As you can see we have these two columns age and income that contains numerical
numbers. You notice that the variables are not on the same scale because the age are
going from 32 to 55 and the salaries going from 57.6 K to like 99.6 K.
So because this age variable in the salary variable don’t have the same scale. This will
cause some issues in your machinery models. And why is that. It’s because your
machine models a lot of machinery models are based on what is called the Euclidean
distance.
114
We use feature scaling to convert different scales to a standard scale to make it easier
for Machine Learning algorithms. We do this in Python as follows:
# feature scaling
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train) X_test = sc_X.transform(X_test)
After the execution of this code, our training independent variable X and our testing
independent variable X and look like this.
115
11. Python program for compressing data via
dimensionality reduction: PCA
Dimensionality reduction is a process used to reduce the number of features (or
dimensions) in a dataset while retaining most of the important information. This can be
achieved through techniques like Principal Component Analysis (PCA), t-SNE, or
UMAP. Here’s an example of compressing data using PCA in Python.
Python program specifically for compressing data using Principal Component
Analysis (PCA). This example includes both compression and reconstruction of the
data, showing how PCA reduces dimensionality and retains most of the important
information.
Program
import numpy as np
import pandas as pd
from [Link] import PCA
from [Link] import load_digits
import [Link] as plt
# Load sample data (Digits dataset)
digits = load_digits()
X = [Link] # Feature matrix
y = [Link] # Labels
# Check original data shape
print("Original data shape:", [Link])
# Apply PCA for dimensionality reduction
n_components = 20 # Compress to 20 dimensions
pca = PCA(n_components=n_components)
X_compressed = pca.fit_transform(X)
# Check compressed data shape
print("Compressed data shape:", X_compressed.shape)
# Reconstruct the data from the compressed form
X_reconstructed = pca.inverse_transform(X_compressed)
# Compare original and reconstructed data
116
print("Reconstructed data shape:", X_reconstructed.shape)
# Visualize one original and one reconstructed digit
[Link](figsize=(8, 4))
# Original
[Link](1, 2, 1)
[Link](X[0].reshape(8, 8), cmap='gray')
[Link]("Original Digit")
[Link]('off')
# Reconstructed
[Link](1, 2, 2)
[Link](X_reconstructed[0].reshape(8, 8), cmap='gray')
[Link](f"Reconstructed (n={n_components})")
[Link]('off')
plt.tight_layout()
[Link]()
Output
117
12. Python program for data clustering
Data clustering is an unsupervised machine learning technique used to group data points
into clusters based on their similarities. Clustering is widely used for exploratory data
analysis, pattern recognition, and feature engineering.
Data clustering is the process of grouping data points into clusters such that points
within the same cluster are more similar to each other than to points in other clusters.
Here's a comprehensive example of data clustering in Python using K-Means,
DBSCAN, and Agglomerative Clustering.
118
# 1. K-Means Clustering
kmeans = KMeans(n_clusters=n_clusters, random_state=random_state)
y_kmeans = kmeans.fit_predict(X)
kmeans_centroids = kmeans.cluster_centers_
# Visualize K-Means Clustering
[Link](X[:, 0], X[:, 1], c=y_kmeans, cmap='viridis', s=50, alpha=0.6)
[Link](kmeans_centroids[:, 0], kmeans_centroids[:, 1], c='red', s=200, marker='X',
label='Centroids')
[Link]("K-Means Clustering")
[Link]("Feature 1")
[Link]("Feature 2")
[Link]()
[Link]()
# 2. DBSCAN Clustering
dbscan = DBSCAN(eps=0.5, min_samples=5)
y_dbscan = dbscan.fit_predict(X)
# Visualize DBSCAN Clustering
[Link](X[:, 0], X[:, 1], c=y_dbscan, cmap='viridis', s=50, alpha=0.6)
[Link]("DBSCAN Clustering")
[Link]("Feature 1")
[Link]("Feature 2")
[Link]()
# 3. Agglomerative Clustering
agglo = AgglomerativeClustering(n_clusters=n_clusters)
y_agglo = agglo.fit_predict(X)
# Visualize Agglomerative Clustering
[Link](X[:, 0], X[:, 1], c=y_agglo, cmap='viridis', s=50, alpha=0.6)
[Link]("Agglomerative Clustering")
[Link]("Feature 1")
119
[Link]("Feature 2")
[Link]()
# 4. Evaluate Clustering Results with Silhouette Score
print("Silhouette Score (K-Means):", silhouette_score(X, y_kmeans))
print("Silhouette Score (DBSCAN):", silhouette_score(X[y_dbscan != -1],
y_dbscan[y_dbscan != -1]))
print("Silhouette Score (Agglomerative):", silhouette_score(X, y_agglo))
Outout
1.
2.
120
3.
4.
121
13. Python programs of classification
Classification is a large domain in the field of statistics and machine learning.
Generally, classification can be broken down into two areas:
1. Binary classification, where we wish to group an outcome into one of two groups.
2. Multi-class classification, where we wish to group an outcome into one of
multiple (more than two) groups.
In this post, the main focus will be on using a variety of classification algorithms
across both of these domains, less emphasis will be placed on the theory behind them.
We can use libraries in Python such as Scikit-Learn for machine learning models,
and Pandas to import data as data frames.
These can easily be installed and imported into Python with pip:
Logistic Regression
Logistic regression is a widely used statistical technique for binary classification
problems in machine learning. It predicts the probability of an event occurring (1) or
not occurring (0) based on a set of input features. In Python, you can implement
logistic regression using popular libraries such as Scikit-learn, Statsmodels, and
TensorFlow.
122
y = ([Link] == 0).astype(int) # Binary classification: Class 0 vs. others
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)
# Train Logistic Regression model
model = LogisticRegression()
[Link](X_train, y_train)
# Make predictions
y_pred = [Link](X_test)
# Evaluate the model
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))
Output
123
feature or attribute, each branch represents a decision rule, and each leaf node
represents a class label or predicted outcome.
Key Characteristics:
1. Hierarchical structure: The decision tree is composed of nodes and branches,
forming a hierarchical structure.
2. Feature selection: The algorithm selects the most informative features at each
node to split the data.
3. Splitting criteria: The algorithm uses a splitting criterion, such as Gini impurity
or information gain, to determine the best feature and split point.
4. Pruning: Techniques like pre-pruning or post-pruning can be used to reduce the
tree’s complexity and prevent overfitting.
124
Output
125
from [Link] import load_iris
from [Link] import classification_report, accuracy_score
# Load dataset
data = load_iris()
X = [Link]
y = [Link]
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)
# Train Random Forest Classifier
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)
# Make predictions
y_pred = rf_model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))
Output
126
Support Vector Machine
A Support Vector Machine (SVM) is a powerful machine learning algorithm widely
used for both linear and nonlinear classification, as well as regression and outlier
detection tasks. SVMs are highly adaptable, making them suitable for various
applications such as text classification, image classification, spam detection,
handwriting identification, gene expression analysis, face detection, and anomaly
detection.
Key Concepts
1. Max-Margin Classifier: SVMs focus on finding the maximum separating
hyperplane between different classes in the target feature space, making them
robust for both binary and multiclass classification.
2. Kernel Trick: SVMs use kernel functions to transform the data into a higher-
dimensional feature space, enabling linear separation of nonlinearly separable
data.
3. Support Vectors: Data points that are closest to the optimal hyperplane are
called support vectors, and they determine the position and orientation of the
hyperplane.
127
y_pred = svm_model.predict(X_test)
# Evaluate the model
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))
Output
Key Concepts
1. K: The number of nearest neighbors to consider for prediction. A small value of
K (e.g., 1 or 3) can lead to overfitting, while a larger value (e.g., 5 or 10) can
result in smoother predictions.
2. Distance metric: The algorithm calculates the distance between the new data
point and existing data points in the training set. Common distance metrics
include Euclidean distance, Manhattan distance, and Minkowski distance.
3. Majority voting: In classification, the algorithm assigns the class label of the
majority of the K nearest neighbors to the new data point.
128
A python program on K-Nearest Neighbors (KNN)
from [Link] import load_iris
from sklearn.model_selection import train_test_split
from [Link] import KNeighborsClassifier
from [Link] import accuracy_score
# Load dataset
data = load_iris()
X = [Link]
y = [Link]
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)
# Train KNN Classifier
knn_model = KNeighborsClassifier(n_neighbors=3)
knn_model.fit(X_train, y_train)
# Make predictions
y_pred = knn_model.predict(X_test)
# Evaluate the model
print("Accuracy:", accuracy_score(y_test, y_pred))
Output
129
14. python program on model evaluation k-fold cross
validation
Python program that demonstrates how to evaluate a model using K-Fold Cross-
Validation with scikit-learn. This method helps assess the model's performance by
splitting the dataset into k folds and using each fold as a test set while the others are
used for training.
130
Output
Python is a high-level, interpreted programming language known for its easy-to-read syntax and
dynamic typing. It supports multiple programming paradigms, including procedural, object-oriented,
and functional programming.
Easy-to-read syntax.
Interpreted language.
Dynamically typed.
Supports multiple programming paradigms.
Large standard library.
Automatic memory management (garbage collection).
Extensible and embeddable.
A list is mutable, meaning its elements can be changed, whereas a tuple is immutable, meaning its
elements cannot be modified after creation.
A decorator is a function that modifies the behavior of another function or method. It is typically used
to add functionality to an existing code without modifying the code itself.
131
6. What is the difference between deepcopy() and copy() in Python?
The copy() function creates a shallow copy of an object, meaning changes to nested objects within the
copy will affect the original. deepcopy(), on the other hand, creates a deep copy, recursively copying
all objects, meaning changes to nested objects will not affect the original.
A lambda function is a small, anonymous function defined with the lambda keyword. It can have any
number of arguments but only one expression. It is often used when a simple function is needed for a
short period of time.
Example:
square = lambda x: x ** 2
print(square(5)) # Output: 25
is checks for object identity, i.e., whether two variables refer to the same object in memory. == checks for
equality, i.e., whether two variables have the same value.
Python uses try, except, else, and finally to handle exceptions. The try block contains code that may raise
an exception. The except block handles the exception. The else block runs if no exception is raised, and
finally runs no matter what.
Example:
try:
x = 10 / 0
except ZeroDivisionError:
print("Cannot divide by zero")
finally:
print("This will always execute")
A generator is a special type of iterator in Python that uses the yield keyword to produce values one at
a time. Generators are memory efficient because they generate values on-the-fly instead of storing
them in memory.
self refers to the instance of the class and allows you to access instance variables and methods within
the class. It must be the first parameter in instance methods.
132
13. How is Python interpreted?
Python language is an interpreted language. Python program runs directly from the source code. It
converts the source code that is written by the programmer into an intermediate language, which is
again translated into machine language that has to be executed
Python memory is managed by Python private heap space. All Python objects and data structures are
located in a private heap. The programmer does not have an access to this private heap, and the
interpreter takes care of this Python private heap.
The allocation of Python heap space for Python objects is done by the Python memory manager. The
core API gives access to some tools for the programmer to code.
Python also has an inbuilt garbage collector, which recycles all the unused memory and frees the
memory and makes it available to the heap space
Everything in Python is an object, and all variables hold references to the objects. The reference
values are according to the functions. Therefore, you cannot change the value of the references.
However, you can change the objects if it is mutable
In Python, every name introduced has a place where it lives and can be hooked for. This is known as a
namespace. It is like a box where a variable name is mapped to the object placed. Whenever the
variable is searched out, this box will be searched to get the corresponding object.
A lambda form in python does not have statements as it is used to make new function object and then
return them at runtime.
Pass means no-operation Python statement, or in other words, it is a place holder in a compound
statement, where there should be a blank left, and nothing has to be written there.
In Python, module is the way to structure a program. Each Python program file is a module, which
imports other modules like objects and attributes.
The folder of Python program is a package of modules. A package can have modules or subfolders.
133
20. What are the rules for local and global variables in Python?
Here are the rules for local and global variables in Python:
Local variables: If a variable is assigned a new value anywhere within the function’s body, it’s
assumed to be local.
Global variables: Those variables that are only referenced inside a function are implicitly global.
SAMPLE PROGRAMS
134
135