0% found this document useful (0 votes)

8 views61 pages

Python Programs for Data Analysis

The document contains Python programs for performing central tendency measures (mean, median, mode) and measures of dispersion (range, variance, standard deviation, IQR) both with and without using built-in functions. It also includes programs to read multiple files from single and multiple folders, and to read and display various types of data (image, text, numeric, audio, video) using different libraries. Additionally, it provides examples of linear regression using single and multiple variables.

Uploaded by

pavansabaloor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views61 pages

Python Programs for Data Analysis

Uploaded by

pavansabaloor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

1.

Write a program to perform central tendency (mean, median, mode) with and
without using built-in function on the data.
Date:07/07/2025
----------------------------------------------------------------------------------------------------------------

WITH BUILT-IN:

import numpy as np
import statistics as stat
data=12,12,23,24,56,32,23,23
print(".....................USING BUILT IN FUNCTION.................")
mean=[Link](data)
median=[Link](data)
mod=[Link](data)
print("The mean value is:",mean)
print("The median value is:",median)
print("The mod value is:",mod)

Output:
.....................USING BUILT IN FUNCTION.................
The mean value is: 25.625
The median value is: 23.0
The mod value is: 23

1|Page
WITHOUT-BUILT IN:

data=[]
n=int(input("Enter number of elemnts:"))
for i in range(n):
ele=int(input(f"Enter the elemnts {i+1}:"))
[Link](ele)
print(" The created list is:",data)
mean=sum(data)/ n
print("Mean without built in function:",mean)
sorted_data=sorted(data)
print("The sorted data is:",sorted_data)
if n%2==0:
mid_index1=n//2
mid_index2=mid_index1-1
median=(sorted_data[mid_index1] + sorted_dat[mid_index]) / 2
else:
mid_index=n//2
median=sorted_data[mid_index]
print("Median is:",median)
count={}
for element in data:
if element in count:
count[element] += 1
else:
count[element]=1
max_count=max([Link]())
mode=[element for element, c in [Link]() if c==max_count]
print("Mode is",mode)

Output:
Enter number of elemnts: 5
Enter the elemnts 1: 6
Enter the elemnts 2: 3
Enter the elemnts 3: 2
Enter the elemnts 4: 8
Enter the elemnts 5: 6
The created list is: [6, 3, 2, 8, 6]
Mean without built in function: 5.0
The sorted data is: [2, 3, 6, 6, 8]
Median is: 6 Mode is [6]

2|Page
2. Write a python program to:
i) read multiple files from single folder
ii) read multiple files from multiple folders.
Date:
----------------------------------------------------------------------------------------------------------------
i) read multiple files from single folder

import os
path=[Link]()
main_folder_name=[Link](path)
print(f"main folder name:{main_folder_name}")
for file in [Link](path):
if [Link](".txt"):
file_path=[Link](path,file)
print(f"file path:{file_path}")
with open(file_path,'r')as f:
print([Link]())

Output:
main folder name:mca
file path:C:\Users\DELL\mca\[Link]
Hello.............!Good Morning

3|Page
ii) read multiple files from multiple folders.

import os
def read_text_files_from_folders(root_folder):
texts = []
for folder_name, subfolders, filenames in [Link](root_folder):
for filename in filenames:
if [Link](".txt"):
file_path = [Link](folder_name, filename)
try:
with open(file_path, 'r', encoding='utf-8') as file:
content = [Link]()
print(folder_name)
print(filename)
print(content)
[Link](content)
except Exception as e:
print(f"Error reading file {file_path}: {e}")
return texts
root_folder = "C:/"

Output :
C:/
[Link]
Deployment Image Servicing and Management tool
Version: 10.0.10240.16384
Image Version: 10.0.10240.16384
Packages listing:
Package Identity : Microsoft-Windows-Client-LanguagePack-
Package~31bf3856ad364e35~amd64~en-US~10.0.10240.16384
State : Installed
Release Type : Language Pack
Install Time : 7/10/2015 1:13 PM
Package Identity : Microsoft-Windows-DiagTrack-Internal-
Package~31bf3856ad364e35~amd64~~10.0.10240.16384

4|Page
State : Installed
Release Type : Feature Pack
Install Time : 7/10/2015 12:20 PM
Package Identity : Microsoft-Windows-Foundation-
Package~31bf3856ad364e35~amd64~~10.0.10240.16384
State : Installed
Release Type : Foundation
Install Time : 7/10/2015 12:20 PM
Package Identity : Microsoft-Windows-LanguageFeatures-Basic-en-us-
Package~31bf3856ad364e35~amd64~~10.0.10240.16384
State : Installed
Release Type : OnDemand Pack
Install Time : 7/10/2015 1:13 PM
Package Identity : Microsoft-Windows-LanguageFeatures-Handwriting-en-us-
Package~31bf3856ad364e35~amd64~~10.0.10240.16384
State : Installed
Release Type : OnDemand Pack
Install Time : 7/10/2015 1:13 PM
Package Identity : Microsoft-Windows-LanguageFeatures-OCR-en-us-
Package~31bf3856ad364e35~amd64~~10.0.10240.16384
State : Installed
Release Type : OnDemand Pack
Install Time : 7/10/2015 1:14 PM
Package Identity : Microsoft-Windows-LanguageFeatures-Speech-en-us-
Package~31bf3856ad364e35~amd64~~10.0.10240.16384
State : Installed
Release Type : OnDemand Pack
Install Time : 7/10/2015 1:14 PM
Package Identity : Microsoft-Windows-LanguageFeatures-TextToSpeech-en-us-
Package~31bf3856ad364e35~amd64~~10.0.10240.16384
State : Installed
Release Type : OnDemand Pack
Install Time : 7/10/2015 1:14 PM
Package Identity : Microsoft-Windows-Prerelease-Client-
Package~31bf3856ad364e35~amd64~en-US~10.0.10240.16384

5|Page
State : Installed
Release Type : Language Pack
Install Time : 7/10/2015 1:13 PM
Package Identity : Microsoft-Windows-Prerelease-Client-
Package~31bf3856ad364e35~amd64~~10.0.10240.16384
State : Installed
Release Type : Feature Pack
Install Time : 7/10/2015 12:20 PM
Package Identity : Microsoft-Windows-RetailDemo-OfflineContent-Content-en-us-
Package~31bf3856ad364e35~amd64~~10.0.10240.16384
State : Installed
Release Type : OnDemand Pack
Install Time : 7/10/2015 1:16 PM
Package Identity : Microsoft-Windows-RetailDemo-OfflineContent-Content-
Package~31bf3856ad364e35~amd64~~10.0.10240.16384
State : Installed
Release Type : OnDemand Pack
Install Time : 7/10/2015 1:16 PM
Package Identity : Package_for_KB3074667~31bf3856ad364e35~amd64~~[Link]
State : Installed
Release Type : Security Update
Install Time : 7/24/2015 3:15 AM
Package Identity : Package_for_KB3081444~31bf3856ad364e35~amd64~~[Link]
State : Install Pending
Release Type : Security Update
Install Time : 8/25/2015 8:38 AM

The operation completed successfully.

6|Page
3. Write a python program to read and display various kinds of data (image, text, and
numeric, audio, video) saved in different format using various python libraries.
Date:
----------------------------------------------------------------------------------------------------------------
#jpg
from PIL import Image
image='[Link]'
image=[Link](image)
[Link]()

Output:

#png
from PIL import Image
image='[Link]'
image=[Link](image)
[Link]()

Output:

7|Page
#gif
import cv2
video = '[Link]'
cap = [Link](video)
if not [Link]():
print("Error: could not open video")
exit()
while True:
ret, frame = [Link]()
if not ret:
break
[Link]('Frame', frame)
if [Link](25) & 0xFF == ord('q'):
break
[Link]()
[Link]()

Output:

# audio
from [Link] import Audio
import [Link]
import numpy as np
import [Link] as plt
audio='computer-keyboard-typing-290582.mp3'
y,sr=[Link](audio,sr=None)
print(f'sampling rate:{sr}Hz')
print(f'Number of sample:{len(y)}')
[Link](figsize=(14,5))
[Link](3,3,3)
[Link]([Link](len(y))/sr,y)

8|Page
[Link]('waveform')
[Link]('Time(s)')
[Link]('Amptitude')
[Link]()
Audio(data=y,rate=sr)

Output:
sampling rate:48000Hz
Number of sample:1793664

# TEXT DATA
#.txt .json .exel
#.txt
file_path ='[Link]'
try:
with open(file_path, 'r') as file:
print([Link]())
except FileNotFoundError:
print(f"Error: The file '{file_path}' was not found.")
except IOError:
print(f"Error: Could not read file '{file_path}'.")

Output:
Hello.............!Good Morning

9|Page
# json
import json
data={
"name":"john doe",
"age":30,
"city":"newyork",
"intrests":["python","data science","Reading"]
}
file_path='[Link]'
try:
with open(file_path,'w')as file:
[Link](data,file,indent=4)
print("Json data has been written to successfully")
except IOError:
print("Error:Could not write to file")
file_path='[Link]'
with open(file_path,'r')as file:
print([Link]())

Output:
Json data has been written to successfully
{
"name": "john doe",
"age": 30,
"city": "newyork",
"intrests": [
"python",
"data science", "Reading" }

#xls
import pandas as pd
file_path='[Link]'
df=pd.read_excel(file_path)
print([Link]())

Output:

First Name Last Name Gender Country Age Date Id

0 Dulce Abril Female US 32 2017-10-15 1562
1 Mara Hashmimoto Female Britain 25 2016-08-16 1582
2 Philip Gent male Goa 15 2015-05-21 2587
3 Kathleen Hanner Female US 18 2017-10-15 3549

10 | P a g e
4 Nereida Magwood Female US 58 2016-08-16 2468
#CSV
import pandas as pd
file_path='[Link]'
df=pd.read_csv(file_path)
print([Link]())

Output:
area bedroom age price
0 2600 3.0 20 550000
1 3000 4.0 15 565000
2 3200 NaN 18 580000
3 3500 3.0 30 595000
4 4000 5.0 8 610000

#tsv
def read_tsv_file(file_path):
with open(file_path,'r')as file:
lines=[Link]()
for line in lines:
fields=[Link]().split('\t')
print(fields)
file_path='[Link]'
read_tsv_file(file_path)

Output:
["[' ']"]
["['Annual budget tracker']"]
["['Plan and track your monthly spending for the entire year']"]
["[' ']"]
["[' ']"]
["['How to use this temple']"]
['']

11 | P a g e
4. Write a program to perform measure of dispersion (range, variance, standard
deviation, IQR) with and without using built-in function on the data set.
Date:09-07-2025
----------------------------------------------------------------------------------------------------------------
WITH BUILT IN :

import numpy as np
data=[]
n=int(input("Enter the number do you want to enter:"))
for i in range(n):
ele=int(input("Enter the elements:"))
[Link](ele)
print("The numbers are:",data)
range_builtin = [Link](data)
variance_builtin = [Link](data)
std_deviation_builtin = [Link](data)
iqr_builtin = [Link](data, 75) - [Link](data,25)
print(".....................Using Built-in Functions............................")
print(f"Range: {range_builtin}")
print(f"Variance: {variance_builtin}")
print(f"Standard Deviation: {std_deviation_builtin}")
print(f"IQR: {iqr_builtin}")

Output:
Enter the number do you want to enter: 14
Enter the elements: 8
Enter the elements: 9
Enter the elements: 4
Enter the elements: 2
Enter the elements: 3
Enter the elements: 5
Enter the elements: 4
Enter the elements: 12
Enter the elements: 78
Enter the elements: 71
Enter the elements: 61
Enter the elements: 36
Enter the elements: 78
Enter the elements: 90
The numbers are: [8, 9, 4, 2, 3, 5, 4, 12, 78, 71, 61, 36, 78, 90]

12 | P a g e
.....................Using Built-in Functions............................
Range: 88
Variance: 1107.4948979591838
Standard Deviation: 33.27904592922074
IQR: 64.25

WITHOUT BUILT IN FUNCTION:

data=[]
n=int(input("Enter the number do you want to enter:"))
for i in range(n):
ele=int(input("Enter the elements:"))
[Link](ele)
print("The numbers are:",data)
range_custom = max(data) - min(data)
mean = sum(data) / len(data)
variance_custom = sum((x - mean) ** 2 for x in data) / (len(data) - 1)
std_dev_custom = variance_custom ** 0.5
sorted_data = sorted(data)
q75_custom = sorted_data[int(len(data) * 0.75)]
q25_custom = sorted_data[int(len(data) * 0.25)]
iqr_custom = q75_custom - q25_custom
print(".......................Without Using Built-in Functions................................")
print(f"Range: {range_custom}")
print(f"Variance: {variance_custom}")
print(f"Standard Deviation: {std_dev_custom}")
print(f"IQR: {iqr_custom}")

Output:
Enter the number do you want to enter: 10
Enter the elements: 10
Enter the elements: 20
Enter the elements: 30
Enter the elements: 40
Enter the elements: 50
Enter the elements: 60
Enter the elements: 70
Enter the elements: 80

13 | P a g e
Enter the elements: 90
Enter the elements: 100
The numbers are: [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
.......................Without Using Built-in Functions................................
Range: 90
Variance: 916.6666666666666
Standard Deviation: 30.276503540974915
IQR: 50

14 | P a g e
5. Write a program to perform linear regression using
i) Single variable
ii) Multiple variables.
Date:11-07-2025
----------------------------------------------------------------------------------------------------------------
i) Single variable

import pandas as pd
from sklearn.linear_model import LinearRegression
import [Link] as plot
df=pd.read_csv('[Link]')
[Link]()

Output:

[Link]()

Output:
Index(['area', 'bedrooms', 'age ', ' price'], dtype='object')

df=[Link](columns={'age ':'age'})
print([Link])

Output:
Index(['area', 'bedrooms', 'age', ' price'], dtype='object')

[Link]('area')
[Link]('price')
[Link]([Link],[Link],color='green',marker='+')
[Link]()

15 | P a g e
Output:

x=df[['area']]
y=df[['price']]
x=[Link][:,0].values
y=[Link][:,3].values
print(x)
print(y)

Output:
[2600 3000 3200 3500 4000]
[550000 556500 580000 595000 610000]

reg=LinearRegression()
x = [Link](-1, 1)
[Link](x,y)

Output:

reg.coef_
Output:
array([46.50179856])

[Link]([[3000]])

16 | P a g e
Output:
array([566209.5323741])

ii) Multiple variables.

import pandas as pd
import [Link] as plt
from sklearn.linear_model import LinearRegression
df=pd.read_csv('homeprices_multiple.csv')
[Link]()
Output:

area bedrooms age price

0 3000 4.0 15 565000

1 3200 NaN 18 610000

2 3600 3.0 30 595000

3 4000 5.0 8 760000

4 4100 6.0 8 810000

[Link]('age')
[Link]('price')
[Link]([Link],[Link],color='green',marker='+')
[Link]()

Output:

17 | P a g e
[Link](df[['area', 'bedrooms', 'age']], df['price'])

Output:

print("coefficient:",reg.coef_)
print("Intercept:",reg.intercept_)

Output:
coefficient: [ 148.64130435 35135.86956522 -1603.26086957]
Intercept: 2581.5217391273472

input_data = [Link]([[3000, 3, 15]], columns=['area', 'bedrooms', 'age'])

predicted_price = [Link](input_data)
print(f"Predicted price: {predicted_price[0]}")

Output:
Predicted price: 529864.130434782

18 | P a g e
6..Program to fit Multiple Linear Regression model on House_ prices dataset .consider
the below table containing hose prices in Monroe ,New Jersey(USA)
area bedrooms age price
2600 3 20 550000
3000 4 15 565000
3200 18 610000
3600 3 30 595000
4000 5 8 760000
4100 5 8 810000
Here price depends on the area(square feet),bedrooms and age of the house(in years).
Predict the prices of new homes based on the following area ,bedrooms and age.
Date:
----------------------------------------------------------------------------------------------------------------

import pandas as pd
import numpy as np
from sklearn import linear_model
import warnings
[Link]('ignore')
df=pd.read_excel('[Link]')
print("The home price dataset is\n",df)

Output:
The home price dataset is
area bedrooms age price
0 2600 3.0 20 550000
1 3000 4.0 15 565000
2 3200 NaN 18 610000
3 3600 3.0 30 595000
4 4000 5.0 8 760000
5 4100 5.0 8 810000

print("The description of the dataset\n",[Link]())

Output:
The description of the dataset
area bedrooms age price
count 6.000000 5.0 6.000000 6.000000
mean 3416.666667 4.0 16.500000 648333.333333
std 587.934237 1.0 8.288546 109117.673484

19 | P a g e
min 2600.000000 3.0 8.000000 550000.000000
25% 3050.000000 3.0 9.750000 572500.000000
50% 3400.000000 4.0 16.500000 602500.000000
75% 3900.000000 5.0 19.500000 722500.000000
max 4100.000000 5.0 30.000000 810000.000000

print("To check if there is any missing value\n",[Link]().any())

Output:
To check if there is any missing value
area False
bedrooms True
age False
price False
dtype: bool

print("The median of bedrooms=",[Link]())

Output:
The median of bedrooms= 4.0

[Link]=[Link]([Link]())
print("Data set After replacing the missing values with median")
print(df)

Output:

Data set After replacing the missing values with median

area bedrooms age price
0 2600 3.0 20 550000
1 3000 4.0 15 565000
2 3200 4.0 18 610000
3 3600 3.0 30 595000
4 4000 5.0 8 760000
5 4100 5.0 8 810000

20 | P a g e
import [Link] as plt
[Link]([Link],[Link])
[Link]('Age of home(in years)')
[Link]('price')
[Link]()

Output:

x=[Link]('price',axis='columns')
print("The datset after dropping price is\n")
print(x)

Output:
The datset after dropping price is
area bedrooms age
0 2600 3.0 20
1 3000 4.0 15
2 3200 4.0 18
3 3600 3.0 30
4 4000 5.0 8
5 4100 5.0 8

y=df['price']
print("The dataset having price is\n")
print(y)

21 | P a g e
Output:
The dataset having price is

0 550000
1 565000
2 610000
3 595000
4 760000
5 810000
Name: price, dtype: int64

reg=linear_model.LinearRegression()
[Link](x,y)

Output:

coef=reg.coef_
print(coef)

Output:
[ 189.57096766 -94877.34896436 -13068.36933232]

b1=coef[0]
b2=coef[1]
b3=coef[2]
print(b1)
print(b2)
print(b3)

Output:
189.57096766248824
-94877.34896436246
-13068.369332315371

22 | P a g e
a=reg.intercept_
print(a)

Output:
595770.0169938187

[Link]([[3000,3,40]])
new_area=3000
new_bedrooms=3
new_age=40
predicted_price=a+(b1*new_area)+(b2*new_bedrooms)+(b3+new_age)
print("The predicted_price\n",predicted_price)

Output:
The predicted_price
866822.5037558805

23 | P a g e
[Link] a program to perform various data visualization technique on sample
dataset.
Date:
----------------------------------------------------------------------------------------------------------------
import pandas as pd
import [Link] as plt
import seaborn as sns
data=pd.read_csv('fruit_data_with_colours.csv')
print("First few rows of the dataset:")
print([Link]())

Output:
First few rows of the dataset:
fruit_label fruit_name fruit_subtype mass width height color_score
0 1 apple granny_smith 192 8.4 7.3 0.55
1 1 apple granny_smith 180 8.0 6.8 0.59
2 1 apple granny_smith 176 7.4 7.2 0.60
3 2 mandarin mandarin 86 6.2 4.7 0.80
4 2 mandarin mandarin 84 6.0 4.6 0.79

[Link](2,2,1)
[Link](data['width'],bins=10,kde=True)
[Link]('Histogram of fruit width')

Output:
Text(0.5, 1.0, 'Histogram of fruit width')

24 | P a g e
[Link](2,2,2)
[Link](x='width',y='height',data=data)
[Link]('scatter plot of width vs Height')

Output:
Text(0.5, 1.0, 'scatter plot of width vs Height')

[Link](2,2,3)
[Link](x='mass',y='color_score',data=data)
[Link]('box plot of mass level color_score')

Output:
Text(0.5, 1.0, 'box plot of mass level color_score')

[Link](2,2,4)
[Link](x='fruit_name',data=data)
[Link]('count of fruit_name')

25 | P a g e
Output:
Text(0.5, 1.0, 'count of fruit_name')

26 | P a g e
[Link] a program to perform Numeric data-processing with and without using built-in
function.
Date:17/07/2025
----------------------------------------------------------------------------------------------------------------
WITH BUILT-IN FUNCTION:

import numpy as np
import pandas as pd
df=pd.read_csv("[Link]")
[Link]()

Output:

[Link]

Output:
(768, 9)

[Link]()

Output:

27 | P a g e
features = [Link][:, :-1]
class_label = [Link][:, -1]
duplicate_values = df[[Link]()]
print("\nDuplicate values:")
print(duplicate_values)

Output:
Duplicate values:
Empty DataFrame
Columns: [Pregnancies, Glucose, BloodPressure, SkinThickness, Insulin, BMI,
DiabetesPedigreeFunction, Age, Outcome]
Index: []

df.drop_duplicates(inplace=True)
print("DataFrame after removing duplicates:")
[Link](5)

Output:
DataFrame after removing duplicates:

missing_values=[Link]().sum()
print("Missing values:\n",missing_values)

Output:
Missing values:
Pregnancies 0
Glucose 0
BloodPressure 0
SkinThickness 0
Insulin 0
BMI 0
DiabetesPedigreeFunction 0
Age 0
Outcome 0
dtype: int64

28 | P a g e
WITH OUT BUILT-IN FUNCTION:
import csv
with open('[Link]','r')as file:
reader=[Link](file)
data=list(reader)

header=data[0]
rows=data[1:]

for i in range(len(rows)):
rows[i]=[float(x)for x in rows[i]]

cols_to_fix=[1,2,3,4,5]

for col in cols_to_fix:

non_zero_vals=[row[col] for row in rows if row[col]!=0]
non_zero_vals.sort()
n=len(non_zero_vals)
median=non_zero_vals[n//2] if n%2!=0 else (non_zero_vals[n//2-1]+
non_zero_vals[n//2])/2

for row in rows:

if row[col]==0:
row[col]=median

print("\nPreprocessed Data(without built_in functions):")

print(header)
for row in rows[:5]:
print(row)

Output:
Preprocessed Data(without built_in functions):
['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI',
'DiabetesPedigreeFunction', 'Age', 'Outcome']
[6.0, 148.0, 72.0, 35.0, 125.0, 33.6, 0.627, 50.0, 1.0]
[1.0, 85.0, 66.0, 29.0, 125.0, 26.6, 0.351, 31.0, 0.0]
[8.0, 183.0, 64.0, 29.0, 125.0, 23.3, 0.672, 32.0, 1.0]
[1.0, 89.0, 66.0, 23.0, 94.0, 28.1, 0.167, 21.0, 0.0]
[0.0, 137.0, 40.0, 35.0, 168.0, 43.1, 2.288, 33.0, 1.0]

29 | P a g e
9. Write a Python program to perform the following using employees_data.csv file:
1. Load the dataset and display the first 5 rows.
2. Clean the data:
* Check for and handle any missing values.
* Convert Joining_Date to datetime format.
3. Feature Engineering:
* Create a new column Years_of_Service calculated from today'sdate.
4. Data Analysis:
* Calculate the average salary per department.
* Find the number of employees in each gender category.
* Identify the department with the highest average Years_of_Service.
5. Data Visualization:
* Create a bar plot of average salary by department.
* Plot a pie chart showing the gender distribution.
6. Export the cleaned and enriched dataset to a new CSV file called
employees_cleaned.csv.
Date:08/08/2025
----------------------------------------------------------------------------------------------------------------

import pandas as pd
import [Link] as plt
import seaborn as sns
from datetime import datetime
df = pd.read_csv('employees .csv')
print("First 5 rows of the dataset:")
print([Link]())

Output:

First 5 rows of the dataset:

EMPLOYEE_ID FIRST_NAME LAST_NAME EMAIL \
0 1 Megan Chang [Link]@[Link]
1 2 Vanessa Patel [Link]@[Link]
2 3 Tammy Woods [Link]@[Link]
3 4 John Ponce [Link]@[Link]
4 5 Amy Olsen [Link]@[Link]

PHONE_NUMBER JOINING_DATE GENDER JOB_ID SALARY \

0 (048)764-7593x82421 11-11-2019 Female PU_MAN 110121.95
1 001-924-115-7815x659 29-01-2019 Other SA_REP 66444.07
2 408.016.0975x35139 16-05-2019 Female MK_MAN 110249.46
3 +1-711-587-1484 21-09-2017 Male PU_MAN 38534.77
4 +1-398-947-1965 23-05-2025 Other PU_MAN 84171.19

30 | P a g e
COMMISSION_PCT MANAGER_ID DEPARTMENT

0 NaN 3 Finance
1 0.24 20 Sales
2 0.07 9 Finance
3 NaN 17 Marketing
4 NaN 10 Finance

# 2. Clean the data

print("\nMissing values before cleaning:")
print([Link]().sum())

Output:

Missing values before cleaning:

EMPLOYEE_ID 0
FIRST_NAME 0
LAST_NAME 0
EMAIL 0
PHONE_NUMBER 0
JOINING_DATE 0
GENDER 0
JOB_ID 0
SALARY 0
COMMISSION_PCT 718
MANAGER_ID 0
DEPARTMENT 0

dtype: int64

[Link](inplace=True)
# Convert Joining_Date to datetime format
df['JOINING_DATE'] = pd.to_datetime(df['JOINING_DATE'],dayfirst=True,errors='coerce')
[Link](subset=['JOINING_DATE'], inplace=True) # Drop rows where Joining_Date
couldn't be parsed
print([Link]())

31 | P a g e
Output:
EMPLOYEE_ID FIRST_NAME LAST_NAME EMAIL \
1 2 Vanessa Patel [Link]@[Link]
2 3 Tammy Woods [Link]@[Link]
6 7 Frances Massey [Link]@[Link]
7 8 Brenda Rogers [Link]@[Link]
13 14 Joanne Stephens [Link]@[Link]

PHONE_NUMBER JOINING_DATE GENDER JOB_ID SALARY \

1 001-924-115-7815x659 2019-01-29 Other SA_REP 66444.07
2 408.016.0975x35139 2019-05-16 Female MK_MAN 110249.46
6 483.396.9477x51591 2023-11-16 Female MK_MAN 39063.11
7 304.135.2560 2017-04-24 Male MK_MAN 72930.88
13 (375)945-9924 2022-03-15 Female MK_MAN 119817.45

COMMISSION_PCT MANAGER_ID DEPARTMENT

1 0.24 20 Sales
2 0.07 9 Finance
6 0.13 40 Accounting
7 0.26 17 IT
13 0.15 20 Marketing

# 3. Feature Engineering

# Create a new column 'Years_of_Service'

today = pd.to_datetime('today')
df['Years_of_Service'] = (today - df['JOINING_DATE']).[Link] // 365
print(df[['EMPLOYEE_ID', 'FIRST_NAME', 'JOINING_DATE',
'Years_of_Service']].head())

Output:

EMPLOYEE_ID FIRST_NAME JOINING_DATE Years_of_Service

1 2 Vanessa 2019-01-29 6
2 3 Tammy 2019-05-16 6
6 7 Frances 2023-11-16 1
7 8 Brenda 2017-04-24 8
13 14 Joanne 2022-03-15 3

# 4. Data Analysis
print("\nAverage salary per department:")
print([Link]('DEPARTMENT')['SALARY'].mean())

32 | P a g e
print("\nEmployee count by gender:")
print(df['GENDER'].value_counts())

print("\nDepartment with highest average Years_of_Service:")

print([Link]('DEPARTMENT')['Years_of_Service'].mean().idxmax())

Output:

Average salary per department:

DEPARTMENT
Accounting 75289.210000
Finance 72732.191081
HR 66857.482766
IT 79238.097692
Marketing 71059.762258
Purchasing 82529.937895
Sales 75067.825000
Name: SALARY, dtype: float64

Employee count by gender:

GENDER
Male 98
Female 96
Other 88
Name: count, dtype: int64
Department with highest average Years_of_Service:
Marketing

# Bar plot of average salary by department

avg_salary_dept_df = [Link]('DEPARTMENT')['SALARY'].mean().reset_index()
avg_salary_dept_df.rename(columns={'SALARY': 'Average_Salary'}, inplace=True)
[Link](figsize=(4,4))
[Link](data=avg_salary_dept_df, x='DEPARTMENT', y='Average_Salary') # no palette
here
[Link]('Average Salary by Department')
[Link]('')
[Link]('Average Salary')
[Link](rotation=45)
plt.tight_layout()
[Link]()

33 | P a g e
Output:

# Pie chart of gender distribution

gender_count = df['GENDER'].value_counts()
[Link](figsize=(4,4))
[Link](gender_count, labels=gender_count.index, autopct='%1.1f%%',
colors=sns.color_palette('pastel'))
[Link]('Gender Distribution')
plt.tight_layout()
[Link]()

Output:

df.to_csv('employees_cleaned.csv', index=False)
print("\nCleaned and enriched dataset exported as 'employees_cleaned.csv'.")

Output:
Cleaned and enriched dataset exported as 'employees_cleaned.csv'.

34 | P a g e
[Link] a program to perform text data preprocessing for the dataset [Link] using
with and without building function.
Date:
----------------------------------------------------------------------------------------------------------------

import numpy as pd
import pandas as pd
df=pd.read_csv("[Link]")
[Link](5)
#expanding the display of text sms columns
pd.set_option('display.max_colwidth',1)
df=df[['Tweets','Retweets']]
[Link]()

Output:

df['Retweets'].value_counts().

Output:
Retweets
88 9
102 8
137 8
89 8
133 7
..
2575 1
5276 1
868 1
6886 1
11302 1
Name: count, Length: 1993, dtype: int64

35 | P a g e
import pandas as pd
import string
import re
from [Link] import WhitespaceTokenizer

def remove_urls(text):
if isinstance(text,str):
return [Link](r'http\S+|www\S+','',text,flags=[Link])
else:
return text
def remove_punctuation(text):
if isinstance(text,str):
return "".join([char for char in text if char not in [Link]])
else:
return text

def tokenization(text):
if isinstance(text,str):
tk=WhitespaceTokenizer()
return [Link](text)
else:
return text

#remove contraction
import contractions

# Define the function to expand contractions

def conc(text):
expanded_text = [Link](text)
return expanded_text
df['lower_case'] = df['Tweets'].apply(lambda x: [Link]()) #lower case
df['rem_url'] = df['lower_case'].apply(remove_urls) # Remove URLs first
df['rem_punct'] = df['rem_url'].apply(lambda x: remove_punctuation(x)) #apply punctuation
df['rem_conct'] = df['rem_punct'].apply(conc) #apply contraction
df['tokenised_msg'] = df['rem_punct'].apply(lambda x: tokenization(x)) #apply tokenization
[Link](10)

36 | P a g e
Output:

from [Link] import stopwords

",".join([Link]('english'))

Output:
"a,about,above,after,again,against,ain,all,am,an,and,any,are,aren,aren't,as,at,be,because,been,
before,being,below,between,both,but,by,can,couldn,couldn't,d,did,didn,didn't,do,does,doesn,d
oesn't,doing,don,don't,down,during,each,few,for,from,further,had,hadn,hadn't,has,hasn,hasn't,
have,haven,haven't,having,he,he'd,he'll,her,here,hers,herself,he's,him,himself,his,how,i,i'd,if,i'
ll,i'm,in,into,is,isn,isn't,it,it'd,it'll,it's,its,itself,i've,just,ll,m,ma,me,mightn,mightn't,more,most,
mustn,mustn't,my,myself,needn,needn't,no,nor,not,now,o,of,off,on,once,only,or,other,our,our
s,ourselves,out,over,own,re,s,same,shan,shan't,she,she'd,she'll,she's,should,shouldn,shouldn't,
should've,so,some,such,t,than,that,that'll,the,their,theirs,them,themselves,then,there,these,they
,they'd,they'll,they're,they've,this,those,through,to,too,under,until,up,ve,very,was,wasn,wasn't,
we,we'd,we'll,we're,were,weren,weren't,we've,what,when,where,which,while,who,whom,why
,will,with,won,won't,wouldn,wouldn't,y,you,you'd,you'll,your,you're,yours,yourself,yourselve
s,you've"

import nltk
stopwords=[Link]('english')
def remove_stopwords(text):
output=[i for i in text if i not in stopwords]
return output
df['rem_stopwords']=df['tokenised_msg'].apply(lambda x:remove_stopwords(x))
df

37 | P a g e
Output:

#streaming and lemmatizing

from [Link] import PorterStemmer
from [Link] import WordNetLemmatizer
from [Link] import word_tokenize
import nltk
porter_stemmer=PorterStemmer()
wordnet_lemmatizer=WordNetLemmatizer()
#defining a function for stemming

def stemming(text):
stem_text=[porter_stemmer.stem(word)for word in text]
return stem_text
def lemmatizer(text):
if isinstance(text,str):
words=word_tokenize(text)
lemmatized_words=[wordnet_lemmatizer.lemmatize(word)for word in words]
return ''.join(lemmatized_words)
else:
return text

38 | P a g e
#applying function for stemming

df['Stemmed_msg']=df['rem_stopwords'].apply(lambda x:stemming(x))

#apply lemmatizer function to the dataframe column

df['msg_lemmatized']=df['rem_stopwords'].apply(lemmatizer)
[Link](10)

Output:

#remove emojis

import emoji

#import demoji
#demoji.download_codes()

def emo(text):
temp=[Link](text,delimiters=(" "," "))
temp=[Link]("_"," ")
return temp
df['rem_emo']=df["rem_punct"].apply(lambda x:emo(x))
[Link](5)

39 | P a g e
Output:

40 | P a g e
[Link] python program for implement chi-square test for feature selection to train
SVM classifier using suitable dataset.
Date:14/08/2025
----------------------------------------------------------------------------------------------------------------
# 1. Import Libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectKBest, chi2
from [Link] import SVC
from [Link] import accuracy_score, classification_report

# 2. Load Dataset
data = pd.read_csv('fruit_data_with_colours.csv')
print("First 5 rows of the dataset:")
print([Link](5))

Output:
First 5 rows of the dataset:
fruit_label fruit_name fruit_subtype mass width height color_score
0 1 apple granny_smith 192 8.4 7.3 0.55
1 1 apple granny_smith 180 8.0 6.8 0.59
2 1 apple granny_smith 176 7.4 7.2 0.60
3 2 mandarin mandarin 86 6.2 4.7 0.80
4 2 mandarin mandarin 84 6.0 4.6 0.79

# 3. Define Column Names for Target and Non-numeric Features

fruit_label = 'fruit_label'
fruit_subtype = 'fruit_subtype'
fruit_name = 'fruit_name'

# 4. Define Features (X) and Target (y)

X = [Link]([fruit_label, fruit_subtype, fruit_name], axis=1)
y = data[fruit_label]

# 5. Feature Selection using Chi-Square Test

k_selected_features = 4 # Select top 4 features
chi2_selector = SelectKBest(chi2, k=k_selected_features)
X_selected = chi2_selector.fit_transform(X, y)

print("\nSelected top features (Chi-Square):")

selected_feature_indices = chi2_selector.get_support(indices=True)
print([Link][selected_feature_indices])

41 | P a g e
Output:
Selected top features (Chi-Square):
Index(['mass', 'width', 'height', 'color_score'], dtype='object')

# 6. Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(
X_selected, y, test_size=0.2, random_state=42
)
# 7. Train the SVM Classifier
svm_classifier = SVC(kernel='linear')
svm_classifier.fit(X_train, y_train)

Output:

# 8. Predictions
y_pred = svm_classifier.predict(X_test)

# 9. Evaluation
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

print("\nAccuracy:", accuracy)
print("\nClassification Report:")
print(report)

Output:
Accuracy: 0.75

Classification Report:
precision recall f1-score support

1 0.67 0.67 0.67 3

2 1.00 1.00 1.00 2
3 0.33 0.50 0.40 2
4 1.00 0.80 0.89 5
accuracy 0.75 12
macro avg 0.75 0.74 0.74 12
weighted avg 0.81 0.75 0.77 12

42 | P a g e
[Link] a program to implement ANOVA test for feature selection to train SVM
classifier using suitable datasets.
Date:26/08/2025
----------------------------------------------------------------------------------------------------------------

import numpy as np
from [Link] import load_iris
from sklearn.feature_selection import SelectPercentile,f_classif
from [Link] import Pipeline
from [Link] import StandardScaler
from [Link] import SVC
from sklearn.model_selection import cross_val_score
import [Link] as plt

X,y=load_iris(return_X_y=True)
mg=[Link](0)
X=[Link]((X,2*[Link](([Link][0],36))))
clf=Pipeline([("anova",SelectPercentile(f_classif)),
("scaler",StandardScaler()),
("svc",SVC(gamma="auto"))
])

score_means=[]
score_stds=[]
percentiles=[1,3,6,10,15,20,30,40,60,80,100]

for percentile in percentiles:

clf.set_params(anova__percentile=percentile)
this_scores=cross_val_score(clf,X,y)
score_means.append(this_scores.mean())
score_stds.append((this_scores.std()))

[Link](percentiles,score_means,[Link](score_stds))
[Link]("Performance of the SVM-Anova varying the percentile offeatures selected")
[Link]([Link](0,100,11,endpoint=True))
[Link]("Percentile")
[Link]("tight")
[Link]()

43 | P a g e
Output:

44 | P a g e
[Link] a program to classify the IRIS dataset using the support vector classifier(svc)
algorithm with data visualization, pre-processing and performance evaluation.
Date:26/08/2025
----------------------------------------------------------------------------------------------------------------
import pandas as pd
import numpy as np
import [Link] as plt
import seaborn as sns
from [Link] import SVC
from [Link] import StandardScaler
from sklearn.model_selection import train_test_split
from [Link] import accuracy_score,confusion_matrix,classification_report
from sklearn import datasets

iris=datasets.load_iris()
X=[Link]
Y=[Link]

df=[Link](X,columns=iris.feature_names)
df['target']=Y
df['target']=df['target'].map(dict(enumerate(iris.target_names)))
[Link](df,hue="target",palette="Set2")
[Link]("pairplot of iris Dataset",y=1.02)
[Link]()

Output:

45 | P a g e
[Link](figsize=(8,6))
[Link]([Link][:,:-1].corr(),annot=True,cmap='coolwarm')
[Link]("Correlation Heatmap of iris Features")
[Link]()

Output:

X=[Link]
Y=[Link]
X_train,X_test,Y_train,Y_test=train_test_split(X,y,test_size=0.3,random_state=1)
print("X_train",X_train)
print("X_test",X_test)
print("Y_train",Y_train)
print("Y_test",Y_test)
46 | P a g e
Output:
X_train [[7.7 2.6 6.9 2.3]
[5.7 3.8 1.7 0.3]
[5. 3.6 1.4 0.2]
[4.8 3. 1.4 0.3]
[5.2 2.7 3.9 1.4]
[5.1 3.4 1.5 0.2]
[5.5 3.5 1.3 0.2]
[7.7 3.8 6.7 2.2]
[6.9 3.1 5.4 2.1]
[7.3 2.9 6.3 1.8]
[6.4 2.8 5.6 2.2]
[6.2 2.8 4.8 1.8]
[6. 3.4 4.5 1.6]
[7.7 2.8 6.7 2. ]
[5.7 3. 4.2 1.2]
[4.8 3.4 1.6 0.2]
[5.7 2.5 5. 2. ]
[6.3 2.7 4.9 1.8]
[4.8 3. 1.4 0.1]
[4.7 3.2 1.3 0.2]
[6.5 3. 5.8 2.2]
[4.6 3.4 1.4 0.3]
[6.1 3. 4.9 1.8]
[6.5 3.2 5.1 2. ]
[6.7 3.1 4.4 1.4]
[5.7 2.8 4.5 1.3]
[6.7 3.3 5.7 2.5]
[6. 3. 4.8 1.8]
[5.1 3.8 1.6 0.2]
[6. 2.2 4. 1. ]
[6.4 2.9 4.3 1.3]
[6.5 3. 5.5 1.8]
[5. 2.3 3.3 1. ]
[6.3 3.3 6. 2.5]
[5.5 2.5 4. 1.3]
[5.4 3.7 1.5 0.2]
[4.9 3.1 1.5 0.2]
[5.2 4.1 1.5 0.1]
[6.7 3.3 5.7 2.1]
[4.4 3. 1.3 0.2]
[6. 2.7 5.1 1.6]
[6.4 2.7 5.3 1.9]
[5.9 3. 5.1 1.8]

47 | P a g e
[5.2 3.5 1.5 0.2]
[5.1 3.3 1.7 0.5]
[5.8 2.7 4.1 1. ]
[4.9 3.1 1.5 0.1]
[7.4 2.8 6.1 1.9]
[6.2 2.9 4.3 1.3]
[7.6 3. 6.6 2.1]
[6.7 3. 5.2 2.3]
[6.3 2.3 4.4 1.3]
[6.2 3.4 5.4 2.3]
[7.2 3.6 6.1 2.5]
[5.6 2.9 3.6 1.3]
[5.7 4.4 1.5 0.4]
[5.8 2.7 3.9 1.2]
[4.5 2.3 1.3 0.3]
[5.5 2.4 3.8 1.1]
[6.9 3.1 4.9 1.5]
[5. 3.4 1.6 0.4]
[6.8 2.8 4.8 1.4]
[5. 3.5 1.6 0.6]
[4.8 3.4 1.9 0.2]
[6.3 3.4 5.6 2.4]
[5.6 2.8 4.9 2. ]
[6.8 3.2 5.9 2.3]
[5. 3.3 1.4 0.2]
[5.1 3.7 1.5 0.4]
[5.9 3.2 4.8 1.8]
[4.6 3.1 1.5 0.2]
[5.8 2.7 5.1 1.9]
[4.8 3.1 1.6 0.2]
[6.5 3. 5.2 2. ]
[4.9 2.5 4.5 1.7]
[4.6 3.2 1.4 0.2]
[6.4 3.2 5.3 2.3]
[4.3 3. 1.1 0.1]
[5.6 3. 4.1 1.3]
[4.4 2.9 1.4 0.2]
[5.5 2.4 3.7 1. ]
[5. 2. 3.5 1. ]
[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.9 2.4 3.3 1. ]
[4.6 3.6 1. 0.2]
[5.9 3. 4.2 1.5]

48 | P a g e
[6.1 2.9 4.7 1.4]
[5. 3.4 1.5 0.2]
[6.7 3.1 4.7 1.5]
[5.7 2.9 4.2 1.3]
[6.2 2.2 4.5 1.5]
[7. 3.2 4.7 1.4]
[5.8 2.7 5.1 1.9]
[5.4 3.4 1.7 0.2]
[5. 3. 1.6 0.2]
[6.1 2.6 5.6 1.4]
[6.1 2.8 4. 1.3]
[7.2 3. 5.8 1.6]
[5.7 2.6 3.5 1. ]
[6.3 2.8 5.1 1.5]
[6.4 3.1 5.5 1.8]
[6.3 2.5 4.9 1.5]
[6.7 3.1 5.6 2.4]
[4.9 3.6 1.4 0.1]]
X_test [[5.8 4. 1.2 0.2]
[5.1 2.5 3. 1.1]
[6.6 3. 4.4 1.4]
[5.4 3.9 1.3 0.4]
[7.9 3.8 6.4 2. ]
[6.3 3.3 4.7 1.6]
[6.9 3.1 5.1 2.3]
[5.1 3.8 1.9 0.4]
[4.7 3.2 1.6 0.2]
[6.9 3.2 5.7 2.3]
[5.6 2.7 4.2 1.3]
[5.4 3.9 1.7 0.4]
[7.1 3. 5.9 2.1]
[6.4 3.2 4.5 1.5]
[6. 2.9 4.5 1.5]
[4.4 3.2 1.3 0.2]
[5.8 2.6 4. 1.2]
[5.6 3. 4.5 1.5]
[5.4 3.4 1.5 0.4]
[5. 3.2 1.2 0.2]
[5.5 2.6 4.4 1.2]
[5.4 3. 4.5 1.5]
[6.7 3. 5. 1.7]
[5. 3.5 1.3 0.3]
[7.2 3.2 6. 1.8]
[5.7 2.8 4.1 1.3]

49 | P a g e
[5.5 4.2 1.4 0.2]
[5.1 3.8 1.5 0.3]
[6.1 2.8 4.7 1.2]
[6.3 2.5 5. 1.9]
[6.1 3. 4.6 1.4]
[7.7 3. 6.1 2.3]
[5.6 2.5 3.9 1.1]
[6.4 2.8 5.6 2.1]
[5.8 2.8 5.1 2.4]
[5.3 3.7 1.5 0.2]
[5.5 2.3 4. 1.3]
[5.2 3.4 1.4 0.2]
[6.5 2.8 4.6 1.5]
[6.7 2.5 5.8 1.8]
[6.8 3. 5.5 2.1]
[5.1 3.5 1.4 0.3]
[6. 2.2 5. 1.5]
[6.3 2.9 5.6 1.8]
[6.6 2.9 4.6 1.3]]

Y_train [2 0 0 0 1 0 0 2 2 2 2 2 1 2 1 0 2 2 0 0 2 0 2 2 1 1 2 2 0 1 1 2 1 2 1 0 0
0201220010212212210101101002220010202
2 0 2 0 1 0 1 1 0 0 1 0 1 1 0 1 1 1 1 2 0 0 2 1 2 1 2 2 1 2 0]
Y_test [0 1 1 0 2 1 2 0 0 2 1 0 2 1 1 0 1 1 0 0 1 1 1 0 2 1 0 0 1 2 1 2 1 2 2 0 1
0 1 2 2 0 2 2 1]

sc=StandardScaler()
[Link](X_train)
X_train_std=[Link](X_train)
X_test_std=[Link](X_test)
print("X_train_std=\n",X_train_std)
print("X_test_std=\n",X_test_std)

Output:
X_train_std=
[[ 2.26050169e+00 -1.05089682e+00 1.77622921e+00 1.42370971e+00]
[-1.18973773e-01 1.82764665e+00 -1.14491883e+00 -1.14263397e+00]
[-9.51790185e-01 1.34788940e+00 -1.31344660e+00 -1.27095115e+00]
[-1.18973773e+00 -9.13823325e-02 -1.31344660e+00 -1.14263397e+00]
[-7.13842639e-01 -8.11018201e-01 9.09514958e-02 2.68855052e-01]
[-8.32816412e-01 8.68132159e-01 -1.25727068e+00 -1.27095115e+00]
[-3.56921319e-01 1.10801078e+00 -1.36962252e+00 -1.27095115e+00]

50 | P a g e
[ 2.26050169e+00 1.82764665e+00 1.66387736e+00 1.29539252e+00]
[ 1.30871150e+00 1.48496290e-01 9.33590353e-01 1.16707534e+00]
[ 1.78460660e+00 -3.31260955e-01 1.43917367e+00 7.82123787e-01]
[ 7.13842639e-01 -5.71139578e-01 1.04594220e+00 1.29539252e+00]
[ 4.75895093e-01 -5.71139578e-01 5.96534810e-01 7.82123787e-01]
[ 2.37947546e-01 8.68132159e-01 4.28007039e-01 5.25489419e-01]
[ 2.26050169e+00 -5.71139578e-01 1.66387736e+00 1.03875815e+00]
[-1.18973773e-01 -9.13823325e-02 2.59479267e-01 1.22206842e-02]
[-1.18973773e+00 8.68132159e-01 -1.20109475e+00 -1.27095115e+00]
[-1.18973773e-01 -1.29077545e+00 7.08886658e-01 1.03875815e+00]
[ 5.94868866e-01 -8.11018201e-01 6.52710734e-01 7.82123787e-01]
[-1.18973773e+00 -9.13823325e-02 -1.31344660e+00 -1.39926834e+00]
[-1.30871150e+00 3.88374913e-01 -1.36962252e+00 -1.27095115e+00]
[ 8.32816412e-01 -9.13823325e-02 1.15829405e+00 1.29539252e+00]
[-1.42768528e+00 8.68132159e-01 -1.31344660e+00 -1.14263397e+00]
[ 3.56921319e-01 -9.13823325e-02 6.52710734e-01 7.82123787e-01]
[ 8.32816412e-01 3.88374913e-01 7.65062582e-01 1.03875815e+00]
[ 1.07076396e+00 1.48496290e-01 3.71831115e-01 2.68855052e-01]
[-1.18973773e-01 -5.71139578e-01 4.28007039e-01 1.40537868e-01]
[ 1.07076396e+00 6.28253536e-01 1.10211813e+00 1.68034407e+00]
[ 2.37947546e-01 -9.13823325e-02 5.96534810e-01 7.82123787e-01]
[-8.32816412e-01 1.82764665e+00 -1.20109475e+00 -1.27095115e+00]
[ 2.37947546e-01 -2.01041131e+00 1.47127420e-01 -2.44413683e-01]
[ 7.13842639e-01 -3.31260955e-01 3.15655191e-01 1.40537868e-01]
[ 8.32816412e-01 -9.13823325e-02 9.89766277e-01 7.82123787e-01]
[-9.51790185e-01 -1.77053269e+00 -2.46104047e-01 -2.44413683e-01]
[ 5.94868866e-01 6.28253536e-01 1.27064590e+00 1.68034407e+00]
[-3.56921319e-01 -1.29077545e+00 1.47127420e-01 1.40537868e-01]
[-4.75895093e-01 1.58776803e+00 -1.25727068e+00 -1.27095115e+00]
[-1.07076396e+00 1.48496290e-01 -1.25727068e+00 -1.27095115e+00]
[-7.13842639e-01 2.54728252e+00 -1.25727068e+00 -1.39926834e+00]
[ 1.07076396e+00 6.28253536e-01 1.10211813e+00 1.16707534e+00]
[-1.66563282e+00 -9.13823325e-02 -1.36962252e+00 -1.27095115e+00]
[ 2.37947546e-01 -8.11018201e-01 7.65062582e-01 5.25489419e-01]
[ 7.13842639e-01 -8.11018201e-01 8.77414430e-01 9.10440971e-01]
[ 1.18973773e-01 -9.13823325e-02 7.65062582e-01 7.82123787e-01]
[-7.13842639e-01 1.10801078e+00 -1.25727068e+00 -1.27095115e+00]
[-8.32816412e-01 6.28253536e-01 -1.14491883e+00 -8.85999602e-01]
[-1.05669938e-15 -8.11018201e-01 2.03303343e-01 -2.44413683e-01]
[-1.07076396e+00 1.48496290e-01 -1.25727068e+00 -1.39926834e+00]
[ 1.90358037e+00 -5.71139578e-01 1.32682182e+00 9.10440971e-01]
[ 4.75895093e-01 -3.31260955e-01 3.15655191e-01 1.40537868e-01]
[ 2.14152792e+00 -9.13823325e-02 1.60770144e+00 1.16707534e+00]
[ 1.07076396e+00 -9.13823325e-02 8.21238506e-01 1.42370971e+00]

51 | P a g e
[ 5.94868866e-01 -1.77053269e+00 3.71831115e-01 1.40537868e-01]
[ 4.75895093e-01 8.68132159e-01 9.33590353e-01 1.42370971e+00]
[ 1.66563282e+00 1.34788940e+00 1.32682182e+00 1.68034407e+00]
[-2.37947546e-01 -3.31260955e-01 -7.75762758e-02 1.40537868e-01]
[-1.18973773e-01 3.26691839e+00 -1.25727068e+00 -1.01431679e+00]
[-1.05669938e-15 -8.11018201e-01 9.09514958e-02 1.22206842e-02]
[-1.54665905e+00 -1.77053269e+00 -1.36962252e+00 -1.14263397e+00]
[-3.56921319e-01 -1.53065407e+00 3.47755719e-02 -1.16096500e-01]
[ 1.30871150e+00 1.48496290e-01 6.52710734e-01 3.97172236e-01]
[-9.51790185e-01 8.68132159e-01 -1.20109475e+00 -1.01431679e+00]
[ 1.18973773e+00 -5.71139578e-01 5.96534810e-01 2.68855052e-01]
[-9.51790185e-01 1.10801078e+00 -1.20109475e+00 -7.57682419e-01]
[-1.18973773e+00 8.68132159e-01 -1.03256698e+00 -1.27095115e+00]
[ 5.94868866e-01 8.68132159e-01 1.04594220e+00 1.55202689e+00]
[-2.37947546e-01 -5.71139578e-01 6.52710734e-01 1.03875815e+00]
[ 1.18973773e+00 3.88374913e-01 1.21446997e+00 1.42370971e+00]
[-9.51790185e-01 6.28253536e-01 -1.31344660e+00 -1.27095115e+00]
[-8.32816412e-01 1.58776803e+00 -1.25727068e+00 -1.01431679e+00]
[ 1.18973773e-01 3.88374913e-01 5.96534810e-01 7.82123787e-01]
[-1.42768528e+00 1.48496290e-01 -1.25727068e+00 -1.27095115e+00]
[-1.05669938e-15 -8.11018201e-01 7.65062582e-01 9.10440971e-01]
[-1.18973773e+00 1.48496290e-01 -1.20109475e+00 -1.27095115e+00]
[ 8.32816412e-01 -9.13823325e-02 8.21238506e-01 1.03875815e+00]
[-1.07076396e+00 -1.29077545e+00 4.28007039e-01 6.53806603e-01]
[-1.42768528e+00 3.88374913e-01 -1.31344660e+00 -1.27095115e+00]
[ 7.13842639e-01 3.88374913e-01 8.77414430e-01 1.42370971e+00]
[-1.78460660e+00 -9.13823325e-02 -1.48197437e+00 -1.39926834e+00]
[-2.37947546e-01 -9.13823325e-02 2.03303343e-01 1.40537868e-01]
[-1.66563282e+00 -3.31260955e-01 -1.31344660e+00 -1.27095115e+00]
[-3.56921319e-01 -1.53065407e+00 -2.14003519e-02 -2.44413683e-01]
[-9.51790185e-01 -2.49016856e+00 -1.33752200e-01 -2.44413683e-01]
[-8.32816412e-01 1.10801078e+00 -1.31344660e+00 -1.27095115e+00]
[-1.07076396e+00 -9.13823325e-02 -1.31344660e+00 -1.27095115e+00]
[-1.07076396e+00 -1.53065407e+00 -2.46104047e-01 -2.44413683e-01]
[-1.42768528e+00 1.34788940e+00 -1.53815030e+00 -1.27095115e+00]
[ 1.18973773e-01 -9.13823325e-02 2.59479267e-01 3.97172236e-01]
[ 3.56921319e-01 -3.31260955e-01 5.40358887e-01 2.68855052e-01]
[-9.51790185e-01 8.68132159e-01 -1.25727068e+00 -1.27095115e+00]
[ 1.07076396e+00 1.48496290e-01 5.40358887e-01 3.97172236e-01]
[-1.18973773e-01 -3.31260955e-01 2.59479267e-01 1.40537868e-01]
[ 4.75895093e-01 -2.01041131e+00 4.28007039e-01 3.97172236e-01]
[ 1.42768528e+00 3.88374913e-01 5.40358887e-01 2.68855052e-01]
[-1.05669938e-15 -8.11018201e-01 7.65062582e-01 9.10440971e-01]
[-4.75895093e-01 8.68132159e-01 -1.14491883e+00 -1.27095115e+00]

52 | P a g e
[-9.51790185e-01 -9.13823325e-02 -1.20109475e+00 -1.27095115e+00]
[ 3.56921319e-01 -1.05089682e+00 1.04594220e+00 2.68855052e-01]
[ 3.56921319e-01 -5.71139578e-01 1.47127420e-01 1.40537868e-01]
[ 1.66563282e+00 -9.13823325e-02 1.15829405e+00 5.25489419e-01]
[-1.18973773e-01 -1.05089682e+00 -1.33752200e-01 -2.44413683e-01]
[ 5.94868866e-01 -5.71139578e-01 7.65062582e-01 3.97172236e-01]
[ 7.13842639e-01 1.48496290e-01 9.89766277e-01 7.82123787e-01]
[ 5.94868866e-01 -1.29077545e+00 6.52710734e-01 3.97172236e-01]
[ 1.07076396e+00 1.48496290e-01 1.04594220e+00 1.55202689e+00]
[-1.07076396e+00 1.34788940e+00 -1.31344660e+00 -1.39926834e+00]]
X_test_std=
[[-1.05669938e-15 2.30740390e+00 -1.42579845e+00 -1.27095115e+00]
[-8.32816412e-01 -1.29077545e+00 -4.14631819e-01 -1.16096500e-01]
[ 9.51790185e-01 -9.13823325e-02 3.71831115e-01 2.68855052e-01]
[-4.75895093e-01 2.06752527e+00 -1.36962252e+00 -1.01431679e+00]
[ 2.49844924e+00 1.82764665e+00 1.49534959e+00 1.03875815e+00]
[ 5.94868866e-01 6.28253536e-01 5.40358887e-01 5.25489419e-01]
[ 1.30871150e+00 1.48496290e-01 7.65062582e-01 1.42370971e+00]
[-8.32816412e-01 1.82764665e+00 -1.03256698e+00 -1.01431679e+00]
[-1.30871150e+00 3.88374913e-01 -1.20109475e+00 -1.27095115e+00]
[ 1.30871150e+00 3.88374913e-01 1.10211813e+00 1.42370971e+00]
[-2.37947546e-01 -8.11018201e-01 2.59479267e-01 1.40537868e-01]
[-4.75895093e-01 2.06752527e+00 -1.14491883e+00 -1.01431679e+00]
[ 1.54665905e+00 -9.13823325e-02 1.21446997e+00 1.16707534e+00]
[ 7.13842639e-01 3.88374913e-01 4.28007039e-01 3.97172236e-01]
[ 2.37947546e-01 -3.31260955e-01 4.28007039e-01 3.97172236e-01]
[-1.66563282e+00 3.88374913e-01 -1.36962252e+00 -1.27095115e+00]
[-1.05669938e-15 -1.05089682e+00 1.47127420e-01 1.22206842e-02]
[-2.37947546e-01 -9.13823325e-02 4.28007039e-01 3.97172236e-01]
[-4.75895093e-01 8.68132159e-01 -1.25727068e+00 -1.01431679e+00]
[-9.51790185e-01 3.88374913e-01 -1.42579845e+00 -1.27095115e+00]
[-3.56921319e-01 -1.05089682e+00 3.71831115e-01 1.22206842e-02]
[-4.75895093e-01 -9.13823325e-02 4.28007039e-01 3.97172236e-01]
[ 1.07076396e+00 -9.13823325e-02 7.08886658e-01 6.53806603e-01]
[-9.51790185e-01 1.10801078e+00 -1.36962252e+00 -1.14263397e+00]
[ 1.66563282e+00 3.88374913e-01 1.27064590e+00 7.82123787e-01]
[-1.18973773e-01 -5.71139578e-01 2.03303343e-01 1.40537868e-01]
[-3.56921319e-01 2.78716114e+00 -1.31344660e+00 -1.27095115e+00]
[-8.32816412e-01 1.82764665e+00 -1.25727068e+00 -1.14263397e+00]
[ 3.56921319e-01 -5.71139578e-01 5.40358887e-01 1.22206842e-02]
[ 5.94868866e-01 -1.29077545e+00 7.08886658e-01 9.10440971e-01]
[ 3.56921319e-01 -9.13823325e-02 4.84182963e-01 2.68855052e-01]
[ 2.26050169e+00 -9.13823325e-02 1.32682182e+00 1.42370971e+00]
[-2.37947546e-01 -1.29077545e+00 9.09514958e-02 -1.16096500e-01]

53 | P a g e
[ 7.13842639e-01 -5.71139578e-01 1.04594220e+00 1.16707534e+00]
[-1.05669938e-15 -5.71139578e-01 7.65062582e-01 1.55202689e+00]
[-5.94868866e-01 1.58776803e+00 -1.25727068e+00 -1.27095115e+00]
[-3.56921319e-01 -1.77053269e+00 1.47127420e-01 1.40537868e-01]
[-7.13842639e-01 8.68132159e-01 -1.31344660e+00 -1.27095115e+00]
[ 8.32816412e-01 -5.71139578e-01 4.84182963e-01 3.97172236e-01]
[ 1.07076396e+00 -1.29077545e+00 1.15829405e+00 7.82123787e-01]
[ 1.18973773e+00 -9.13823325e-02 9.89766277e-01 1.16707534e+00]
[-8.32816412e-01 1.10801078e+00 -1.31344660e+00 -1.14263397e+00]
[ 2.37947546e-01 -2.01041131e+00 7.08886658e-01 3.97172236e-01]
[ 5.94868866e-01 -3.31260955e-01 1.04594220e+00 7.82123787e-01]
[ 9.51790185e-01 -3.31260955e-01 4.84182963e-01 1.40537868e-01]]

svm=SVC(kernel='linear',random_state=1,C=0.1)
[Link](X_train_std,Y_train)
Y_pred=[Link](X_test_std)
acc=accuracy=accuracy_score(Y_test,Y_pred)
print('Accuracy:%3f'% acc)

Output:
Accuracy:0.955556

print("\n classiefication Report:")

print(classification_report(Y_test,Y_pred,target_names=iris.target_names))

Output:
classiefication Report:
precision recall f1-score support

setosa 1.00 1.00 1.00 14

versicolor 0.94 0.94 0.94 18
virginica 0.92 0.92 0.92 13
accuracy 0.96 45
macro avg 0.96 0.96 0.96 45
weighted avg 0.96 0.96 0.96 45

cm=confusion_matrix(Y_test,Y_pred)
[Link](figsize=(6,5))
[Link](cm,annot=True,fmt='d',cmap='Blues',
xticklabels=iris.target_names, yticklabels=iris.target_names)
[Link]("Confusion Matrix")
[Link]("Predicted Label")
[Link]("True label")
[Link]()

54 | P a g e
Output:

55 | P a g e
[Link] a program to implement K-means clustering. Using data visualization
technique to illustrate the clustering.
Date:28/08/2025
----------------------------------------------------------------------------------------------------------------
import numpy as nm
import [Link] as mtp
import pandas as pd
dataset=pd.read_csv("Mall_Customers.csv")
[Link](5)

Output:
CustomerID Gender Age Annual Income (k$) Spending Score (1-100)
0 1 Male 19 15 39
1 2 Male 21 15 81
2 3 Female 20 16 6
3 4 Female 23 16 77
4 5 Female 31 17 40

x=[Link][:,[3,4]].values
from [Link] import KMeans
wcss_list=[]
for i in range(1,11):
kmeans=KMeans(n_clusters=i,init='k-means++',random_state=42)
[Link](x)
wcss_list.append(kmeans.inertia_)
[Link](range(1,11),wcss_list)
[Link]('The Elbow Method cluster(k)')
[Link]('Number of clusters(k)')
[Link]('wcss_list')
[Link]()
Output:

56 | P a g e
kmeans=KMeans(n_clusters=i,init='k-means++',random_state=42)
y_predict=kmeans.fit_predict(x)
[Link](x[y_predict == 0, 0], x[y_predict == 0, 1], s=100, c='blue', label='Cluster 1')
[Link](x[y_predict == 1, 0], x[y_predict == 1, 1], s=100, c='green', label='Cluster 2')
[Link](x[y_predict == 2, 0], x[y_predict == 2, 1], s=100, c='red', label='Cluster 3')
[Link](x[y_predict == 3, 0], x[y_predict == 3, 1], s=100, c='cyan', label='Cluster 4')
[Link](x[y_predict == 4, 0], x[y_predict == 4, 1], s=100, c='magenta', label='Cluster 5')
[Link](kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=300, c='yellow',
label='Centroid')
[Link]('Clusters of customers')
[Link]('Annual Income (k$)')
[Link]('Spending Score (1-100)')
[Link]()
[Link]()
Output:

57 | P a g e
[Link] a program to implement hierarchical clustering algorithm. Using data
visualization technique to illustrate the clustering.
Date:28/08/2025
----------------------------------------------------------------------------------------------------------------
import numpy as np
import [Link] as plt
import pandas as pd
df=pd.read_csv("Mall_Customers.csv")
[Link]()

Output:
CustomerID Gender Age Annual Income (k$) Spending Score (1-100)
0 1 Male 19 15 39
1 2 Male 21 15 81
2 3 Female 20 16 6
3 4 Female 23 16 77
4 5 Female 31 17 40

[Link]().sum()
Output:
CustomerID 0
Gender 0
Age 0
Annual Income (k$) 0
Spending Score (1-100) 0
dtype: int64

x=[Link][:,[3,4]].values
import [Link] as sch
dendrogram=[Link]([Link](x,method='ward'))
[Link]("Dendrogram")
[Link]("Customer")
[Link]("Euclidean sistance")
[Link]()

Output:

58 | P a g e
from [Link] import AgglomerativeClusteringhc =
AgglomerativeClustering(n_clusters=5, linkage='ward')
y_hc = hc.fit_predict(x)
[Link](x[y_hc == 0, 0], x[y_hc == 0, 1], s=100, c="red", label="cluster 1")
[Link](x[y_hc == 1, 0], x[y_hc == 1, 1], s=100, c="blue", label="cluster 2")
[Link](x[y_hc == 2, 0], x[y_hc == 2, 1], s=100, c="green", label="cluster 3")
[Link](x[y_hc == 3, 0], x[y_hc == 3, 1], s=100, c="cyan", label="cluster 4")
[Link](x[y_hc == 4, 0], x[y_hc == 4, 1], s=100, c="orange", label="cluster 5")
[Link]("Clusters of customers")
[Link]("Annual Income")
[Link]("Spending Score (1-100)")
[Link]()
[Link]()

Output:

59 | P a g e
[Link] a program to implement grid-based clustering using a suitable dataset.
Visualize the data using scatter plot.
Date:
---------------------------------------------------------------------------------------------------------------
import numpy as np
import pandas as pd
import [Link] as plt
import seaborn as sns
df=pd.read_csv("Mall_Customers.csv")
[Link](5)
Output:

data = df[['Annual Income (k$)','Spending Score (1-100)']]

X = [Link]

import numpy as np
def grid_based_clustering(X, grid_size):
x_edges = [Link]([Link](X[:, 0]), [Link](X[:, 0]), grid_size[0] + 1)
y_edges = [Link]([Link](X[:, 1]), [Link](X[:, 1]), grid_size[1] + 1)
grid_cells, _, _ = np.histogram2d(X[:, 0], X[:, 1], bins=[x_edges, y_edges])
return grid_cells, x_edges, y_edges

# Define grid size

grid_size = (9, 9)
grid_cells, x_edges, y_edges = grid_based_clustering(X, grid_size)

import [Link] as plt

[Link](figsize=(15,6))
[Link](grid_cells.T,origin='lower',cmap='coolwarm',extent=[x_edges[0],x_edges[-
1],y_edges[0],y_edges[-1]])
[Link](label='Number of points')
[Link](X[:,0],X[:,1],c='y',s=50,label='Customers')
[Link]('Annual Income(k$)')
[Link]("Spending Score(1-100)")
[Link]('Grid-Based Clustering Heatmap of Mall Customers')
[Link]()
[Link]()

60 | P a g e
Output:

61 | P a g e

Programs On File Handling
No ratings yet
Programs On File Handling
2 pages
File Handling in Python: A Guide
No ratings yet
File Handling in Python: A Guide
24 pages
Python File Handling and Operations Guide
No ratings yet
Python File Handling and Operations Guide
20 pages
Differences Between File Types
No ratings yet
Differences Between File Types
7 pages
Python File Handling Functions Guide
No ratings yet
Python File Handling Functions Guide
4 pages
Python File Handling Basics
No ratings yet
Python File Handling Basics
7 pages
Kerala School Computer Science Practical
No ratings yet
Kerala School Computer Science Practical
45 pages
Python File Handling Techniques
No ratings yet
Python File Handling Techniques
13 pages
Python File Handling and Exceptions Guide
No ratings yet
Python File Handling and Exceptions Guide
23 pages
Python File and Exception Handling Guide
No ratings yet
Python File and Exception Handling Guide
17 pages
Password, Email, URL Validation Program
No ratings yet
Password, Email, URL Validation Program
7 pages
File Handling in Python: Types & Methods
No ratings yet
File Handling in Python: Types & Methods
46 pages
Files and Grids in Python Programming
No ratings yet
Files and Grids in Python Programming
57 pages
Assignment 6: Array Statistics
No ratings yet
Assignment 6: Array Statistics
8 pages
File Handling in Python Basics
No ratings yet
File Handling in Python Basics
95 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
14 pages
File Handling Techniques in Python
No ratings yet
File Handling Techniques in Python
1 page
Python CSV File Handling Guide
No ratings yet
Python CSV File Handling Guide
6 pages
Python Programs for Data Analysis and Visualization
No ratings yet
Python Programs for Data Analysis and Visualization
25 pages
File Handling
No ratings yet
File Handling
23 pages
File Handling
No ratings yet
File Handling
57 pages
Python File Handling and Data Structures
No ratings yet
Python File Handling and Data Structures
30 pages
Text File Handling
No ratings yet
Text File Handling
12 pages
Python OS Module: File Management Guide
No ratings yet
Python OS Module: File Management Guide
3 pages
File Handling 12cs
No ratings yet
File Handling 12cs
56 pages
Cs Practical
No ratings yet
Cs Practical
22 pages
Python File Handling Techniques
No ratings yet
Python File Handling Techniques
32 pages
Python Text File Operations Guide
No ratings yet
Python Text File Operations Guide
6 pages
Python File Handling Basics
No ratings yet
Python File Handling Basics
55 pages
Python File Operations and Examples
No ratings yet
Python File Operations and Examples
10 pages
Python Functions and File Handling Projects
No ratings yet
Python Functions and File Handling Projects
36 pages
Python File Handling Basics
No ratings yet
Python File Handling Basics
27 pages
File Handling in Python
No ratings yet
File Handling in Python
4 pages
Python File Handling Basics
No ratings yet
Python File Handling Basics
7 pages
Python File Handling Basics
No ratings yet
Python File Handling Basics
56 pages
Handling Unsupported Seek Operations
No ratings yet
Handling Unsupported Seek Operations
64 pages
Python File Handling
No ratings yet
Python File Handling
48 pages
Python File Handling and Exceptions Guide
No ratings yet
Python File Handling and Exceptions Guide
89 pages
File Operations in Python Scripts
No ratings yet
File Operations in Python Scripts
3 pages
Python File and String Operations Guide
No ratings yet
Python File and String Operations Guide
7 pages
File Handling and Data Types in Python
No ratings yet
File Handling and Data Types in Python
14 pages
Python Programs for Basic Operations
No ratings yet
Python Programs for Basic Operations
65 pages
Python Nested Loops, Exceptions, File I/O
No ratings yet
Python Nested Loops, Exceptions, File I/O
14 pages
Python File I/O Basics in CS50
No ratings yet
Python File I/O Basics in CS50
11 pages
Python File Handling Basics
No ratings yet
Python File Handling Basics
7 pages
PYTHON Lab Manual
No ratings yet
PYTHON Lab Manual
16 pages
Python File Handling Basics
No ratings yet
Python File Handling Basics
18 pages
Python File Handling Worksheet
No ratings yet
Python File Handling Worksheet
10 pages
Python File Handling and CSV Operations
No ratings yet
Python File Handling and CSV Operations
7 pages
Python File Handling Basics
No ratings yet
Python File Handling Basics
51 pages
Data Input and File Handling in Python
No ratings yet
Data Input and File Handling in Python
20 pages
Ilovepdf Merged Pagenumber 2
No ratings yet
Ilovepdf Merged Pagenumber 2
55 pages
Ms Record
No ratings yet
Ms Record
43 pages
File I/O and Exception Handling in Python
No ratings yet
File I/O and Exception Handling in Python
37 pages
Python Programs for Basic Operations
No ratings yet
Python Programs for Basic Operations
29 pages
Eye Disease Detection Synopsis
No ratings yet
Eye Disease Detection Synopsis
5 pages
Understanding DSDV Routing Protocol
No ratings yet
Understanding DSDV Routing Protocol
9 pages
Mobile Computing: Key Concepts & Techniques
No ratings yet
Mobile Computing: Key Concepts & Techniques
1 page
Android App Development Projects Guide
No ratings yet
Android App Development Projects Guide
2 pages
Key Android Features for App Development
No ratings yet
Key Android Features for App Development
17 pages
Machine Learning for Heart Disease Detection
No ratings yet
Machine Learning for Heart Disease Detection
4 pages
PeopleWare 19 Release Overview
No ratings yet
PeopleWare 19 Release Overview
5 pages
Comprehensive Linux OS Guide
No ratings yet
Comprehensive Linux OS Guide
4 pages
ICT Glossary
No ratings yet
ICT Glossary
5 pages
Installing Actix on Windows 7 Guide
No ratings yet
Installing Actix on Windows 7 Guide
2 pages
Python Programming Basics Guide
No ratings yet
Python Programming Basics Guide
70 pages
MPLAB X IDE and XC8 Compiler Guide
No ratings yet
MPLAB X IDE and XC8 Compiler Guide
24 pages
C#.NET 8.0 System.IO Namespace Guide
No ratings yet
C#.NET 8.0 System.IO Namespace Guide
200 pages
CP210x Universal Driver Release Notes
No ratings yet
CP210x Universal Driver Release Notes
9 pages
Miura 200
No ratings yet
Miura 200
22 pages
Raylib Cheatsheet
No ratings yet
Raylib Cheatsheet
8 pages
Raspberry Pi NAS Setup Guide
No ratings yet
Raspberry Pi NAS Setup Guide
10 pages
Python's Origins: A Talk with Guido
No ratings yet
Python's Origins: A Talk with Guido
5 pages
Fixing MobaHot Save Errors
No ratings yet
Fixing MobaHot Save Errors
2 pages
Linux Commands Cheat Sheet Guide
No ratings yet
Linux Commands Cheat Sheet Guide
24 pages
Ansible Hostvars and Playbook Examples
No ratings yet
Ansible Hostvars and Playbook Examples
9 pages
11 It MCQ
No ratings yet
11 It MCQ
48 pages
FMOD Studio Getting Started Guide
No ratings yet
FMOD Studio Getting Started Guide
57 pages
Full Stack Development Lab Manual
No ratings yet
Full Stack Development Lab Manual
57 pages
Records Management and Filing Systems
No ratings yet
Records Management and Filing Systems
9 pages
File Allocation Methods Explained
No ratings yet
File Allocation Methods Explained
21 pages
Ultimax Start r0111-101
100% (1)
Ultimax Start r0111-101
189 pages
Administrative Support Training Manual
No ratings yet
Administrative Support Training Manual
48 pages
Backup and Restore in MongoDB - Ch-11
No ratings yet
Backup and Restore in MongoDB - Ch-11
19 pages
B0193au - H Process Alarm Configuration
No ratings yet
B0193au - H Process Alarm Configuration
98 pages
Fix SEB White Screen Issues: Troubleshooting Guide
No ratings yet
Fix SEB White Screen Issues: Troubleshooting Guide
2 pages
Hitachi Backup Restore System Manual
No ratings yet
Hitachi Backup Restore System Manual
55 pages
Module 4 Activities: Screenshot Guidelines
No ratings yet
Module 4 Activities: Screenshot Guidelines
78 pages
Rescue and Smart Assistant Guide
No ratings yet
Rescue and Smart Assistant Guide
54 pages
Professional Switched Networks Research Guide
No ratings yet
Professional Switched Networks Research Guide
13 pages

Python Programs for Data Analysis

Uploaded by

Python Programs for Data Analysis

Uploaded by

1.

The operation completed successfully.

First Name Last Name Gender Country Age Date Id

WITHOUT BUILT IN FUNCTION:

ii) Multiple variables.

area bedrooms age price

0 3000 4.0 15 565000

1 3200 NaN 18 610000

2 3600 3.0 30 595000

3 4000 5.0 8 760000

4 4100 6.0 8 810000

input_data = [Link]([[3000, 3, 15]], columns=['area', 'bedrooms', 'age'])

print("The description of the dataset\n",[Link]())

print("To check if there is any missing value\n",[Link]().any())

print("The median of bedrooms=",[Link]())

Data set After replacing the missing values with median

for col in cols_to_fix:

for row in rows:

print("\nPreprocessed Data(without built_in functions):")

First 5 rows of the dataset:

PHONE_NUMBER JOINING_DATE GENDER JOB_ID SALARY \

# 2. Clean the data

Missing values before cleaning:

PHONE_NUMBER JOINING_DATE GENDER JOB_ID SALARY \

COMMISSION_PCT MANAGER_ID DEPARTMENT

# Create a new column 'Years_of_Service'

EMPLOYEE_ID FIRST_NAME JOINING_DATE Years_of_Service

print("\nDepartment with highest average Years_of_Service:")

Average salary per department:

Employee count by gender:

# Bar plot of average salary by department

# Pie chart of gender distribution

# Define the function to expand contractions

from [Link] import stopwords

#streaming and lemmatizing

#apply lemmatizer function to the dataframe column

# 3. Define Column Names for Target and Non-numeric Features

# 4. Define Features (X) and Target (y)

# 5. Feature Selection using Chi-Square Test

print("\nSelected top features (Chi-Square):")

1 0.67 0.67 0.67 3

for percentile in percentiles:

print("Missing values in dataset:\n",[Link]().sum())

print("\n classiefication Report:")

setosa 1.00 1.00 1.00 14

data = df[['Annual Income (k$)','Spending Score (1-100)']]

# Define grid size

import [Link] as plt

You might also like