0% found this document useful (0 votes)
12 views63 pages

FDS Record

This document is a laboratory record for the Data Science and Analytics course at Anna University, Coimbatore, for the academic year 2024-2025. It includes various experiments related to data analysis, visualization, statistical tests, and model building using Python libraries such as Pandas, NumPy, and Matplotlib. Each experiment outlines the aim, algorithm, program code, and expected results.

Uploaded by

dushyanth762
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views63 pages

FDS Record

This document is a laboratory record for the Data Science and Analytics course at Anna University, Coimbatore, for the academic year 2024-2025. It includes various experiments related to data analysis, visualization, statistical tests, and model building using Python libraries such as Pandas, NumPy, and Matplotlib. Each experiment outlines the aim, algorithm, program code, and expected results.

Uploaded by

dushyanth762
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

ANNA UNIVERSITY

REGIONAL CAMPUS, COIMBATORE

LABORATORY RECORD
2024-2025

NAME :

REGISTER NUMBER :

BRANCH : [Link] – ARTIFICIAL INTELLIGENCE

AND DATA SCIENCE

: AD3411
SUBJECT CODE

SUBJECT TITLE : DATA SCIENCE AND ANALYTICS

LABORATORY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

ANNA UNIVERSITY REGIONAL CAMPUS


COIMBATORE 641 046
ANNA UNIVERSITY

REGIONAL CAMPUS, COIMBATORE

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

BONAFIDE CERTIFICATE

Certified that this is the Bonafide record of Practical done in AD3411 DATA SCIENCE
AND ANALYTICS LABORATORY by………………………………………………….
Roll No………………………… Second Year/Fourth Semester during 2024-2025.

STAFF IN-CHARGE HEAD OF THE DEPARTMENT

University Register Number :


Submitted for the University Practical Examination Held on

INTERNAL EXAMINER EXTERNAL EXAMINER


INDEX
EX DATE TITLE PAGE MARKS SIGN
NO. NO.
INDEX
EX DATE TITLE PAGE MARKS SIGN
NO. NO.
Exp No : 01
WORKING WITH PANDAS
Date : DATAFRAME

AIM:

ALGORITHM:
Step 1: Start the Program
Step2: import pandas module
Step 3: Create a dataframe using the dictionary
Step 4: Print the output
Step 5: Stop the process

PROGRAM:
import pandas as pd
data = { 'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago'] }
df = [Link](data)
print([Link]())
filtered_df = df[df['Age'] > 30]
print(filtered_df)
df['Senior'] = df['Age'] > 30
print(df)
grouped_df = [Link]('City')['Age'].mean()
print(grouped_df)
OUTPUT:
RESULT:
OUTPUT:
Exp No : 02
BASIC PLOTS USING MATPLOTLIB
Date :

AIM:

ALGORITHM:
Step 1: Start the Program
Step 2: import Matplotlib module
Step 3: Create a Basic plots using Matplotlib
Step 4: Print the output
Step 5: Stop the process

PROGRAM:
import [Link] as plt
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
[Link](x, y, color='green', linestyle='--', marker='o', markersize=10)
[Link]('X-axis')
[Link]('Y-axis')
[Link]('Customized Line Plot')
[Link]()

RESULT:
OUTPUT:
Exp No : 03 A
FREQUENCY DISTRIBUTION
Date :

AIM:

ALGORITHM:
Step 1: Start the Program
Step 2: Import the python library modules
Step 3: Write the code the frequency distribution
Step 4: Print the result
Step 5: Stop the process

PROGRAM:
import pandas as pd
data = [1, 2, 2, 3, 4, 4, 4, 5, 6, 7, 7, 7, 7]
series = [Link](data)
frequency = series.value_counts()
print(frequency)

RESULT:
Exp No : 03 B
AVERAGES
Date :

AIM:

ALGORITHM:
Step 1: Start the Program
Step 2: Import the python library modules
Step 3: Write the code the average
Step 4: Print the result
Step 5: Stop the process

PROGRAM:
import numpy as np
import pandas as pd
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
mean_pandas = [Link](data).mean()
print(f"Mean (Pandas): {mean_pandas}")
mean_numpy = [Link](data)
print(f"Mean (NumPy): {mean_numpy}")
median = [Link](data).median()
print(f"Median: {median}")
mode = [Link](data).mode()
print(f"Mode: {mode}")
OUTPUT:
RESULT:
Exp No : 03 C
VARIABILITY
Date :

AIM:

ALGORITHM:
Step 1: Start the Program
Step 2: Import the python library modules
Step 3: Write the code the variability
Step 4: Print the result
Step 5: Stop the process

PROGRAM:
import numpy as np
import pandas as pd
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
range_value = max(data) - min(data)
print(f"Range: {range_value}")
variance_pandas = [Link](data).var()
print(f"Variance (Pandas): {variance_pandas}")
std_dev_pandas = [Link](data).std()
print(f"Standard Deviation (Pandas): {std_dev_pandas}")
variance_numpy = [Link](data)
print(f"Variance (NumPy): {variance_numpy}")
std_dev_numpy = [Link](data)
print(f"Standard Deviation (NumPy): {std_dev_numpy}")
OUTPUT:
RESULT:
Exp No : 04 A
NORMAL CURVES
Date :

AIM:

ALGORITHM:
Step 1: Start the Program
Step 2: Import packages numpy and matplotlib
Step 3: Create the distribution
Step 4: Visualizing the distribution
Step 5: Stop the process

PROGRAM:
import numpy as np
import [Link] as plt
mu = 0
sigma = 1
data = [Link](mu, sigma, 1000)
[Link](data, bins=30, density=True, alpha=0.6, color='g')
xmin, xmax = [Link]() x = [Link](xmin, xmax, 100)
p = 1/(sigma * [Link](2 * [Link])) * [Link](-0.5 * ((x - mu) /
sigma)**2) [Link](x, p, 'k', linewidth=2)
[Link]('Data values')
[Link]('Probability density')
[Link]('Normal Distribution Curve')
[Link]()
OUTPUT:
RESULT:
Exp No : 04 B
CORRELATION AND SCATTER
Date : PLOTS

AIM:

ALGORITHM:
Step 1: Start the Program
Step 2: Create variable y1, y2
Step 3: Create variable x, y3 using random function
Step 4: Plot the scatter plot
Step 5: Print the result
Step 6: Stop the process

PROGRAM:
import [Link] as plt
import numpy as np
x = [Link](100)
y = 2 * x + [Link](0, 0.1, 100)
[Link](x, y, alpha=0.7)
[Link]('X')
[Link]('Y')
[Link]('Scatter Plot: X vs Y')
[Link]()
OUTPUT:
RESULT:
Exp No : 04 C
CORRELATION COEFFICIENT
Date :

AIM:

ALGORITHM:
Step 1: Start the Program
Step 2: Import math package
Step 3: Define correlation coefficient function
Step 4: Calculate correlation using formula
Step 5: Print the result
Step 6: Stop the process

PROGRAM:
import numpy as np
import pandas as pd
x = [Link](100)
y = 2 * x + [Link](0, 0.1, 100)
correlation_numpy = [Link](x, y)[0, 1]
print(f"Correlation Coefficient (NumPy): {correlation_numpy}")
df = [Link]({'x': x, 'y': y})
correlation_pandas = [Link]().loc['x', 'y']
print(f"Correlation Coefficient (Pandas): {correlation_pandas}")
OUTPUT:
RESULT:
Exp No : 05
SIMPLE LINEAR REGRESSION
Date :

AIM:

ALGORITHM:
Step 1: Start the Program
Step 2: Import numpy and matplotlib package
Step 3: Define coefficient function
Step 4: Calculate cross-deviation and deviation about x
Step 5: Calculate regression coefficients
Step 6: Plot the Linear regression and define main function
Step 7: Print the result
Step 8: Stop the process

PROGRAM:
import numpy as np
import [Link] as plt
from sklearn.linear_model import LinearRegression
[Link](0)
X = [Link](100) * 10
Y = 2.5 * X + [Link](0, 2, 100)
[Link](X, Y, color='blue', alpha=0.7)
[Link]('X')
[Link]('Y')
OUTPUT:
[Link]('Scatter Plot: X vs Y')
[Link]()
X = [Link](-1, 1)
model = LinearRegression()
[Link](X, Y)
slope = model.coef_[0]
intercept = model.intercept_
print(f"Slope (beta_1): {slope}")
print(f"Intercept (beta_0): {intercept}")
Y_pred = [Link](X)
[Link](X, Y, color='blue', alpha=0.7, label='Data')
[Link](X, Y_pred, color='red', label='Fitted Line')
[Link]('X') [Link]('Y')
[Link]('Simple Linear Regression: Fitted Line')
[Link]()
[Link]()
r_squared = [Link](X, Y)
print(f"R-squared: {r_squared}")
X_new = [Link]([[15]])
Y_new = [Link](X_new)
print(f"Predicted Y for X = 15: {Y_new[0]}")

RESULT:
Exp No : 06
Z-TEST
Date :

AIM:

ALGORITHM:
Step 1: Start the Program
Step 2: Import math package
Step 3: Define Z-test function
Step 4: Calculate Z-test using formula
Step 5: Print the result
Step 6: Stop the process

PROGRAM:
import numpy as np
import [Link] as stats
mean_1 = 50
mean_2 = 45
std_1 = 10
std_2 = 12
size_1 = 40
size_2 = 35
z_score_two_sample = (mean_1 - mean_2) / [Link]((std_1**2 / size_1) +
(std_2**2 / size_2))
p_value_two_sample = 2 * (1 - [Link](abs(z_score_two_sample)))
print(f"Z-Score: {z_score_two_sample}")
print(f"P-value: {p_value_two_sample}")
OUTPUT:
RESULT:
Exp No : 07
T-TEST
Date :

AIM:

ALGORITHM:
Step 1: Start the Program
Step 2: Import math package
Step 3: Define T-test function
Step 4: Calculate T-test using formula
Step 5: Print the result
Step 6: Stop the process

PROGRAM:
import [Link] as stats
import numpy as np
sample_data = [Link]([52, 55, 48, 49, 53, 54, 51, 50, 55, 58, 56, 57, 52, 51,
54, 53, 59, 61, 50, 52, 54, 53, 49, 47, 52, 51, 50, 48, 56, 55])
population_mean = 50
t_stat, p_value = stats.ttest_1samp(sample_data, population_mean)
print(f"T-statistic: {t_stat}")
print(f"P-value: {p_value}")
OUTPUT:
RESULT:
Exp No : 08
ANOVA
Date :

AIM:

ALGORITHM:
Step 1: Start the Program
Step 2: Import package
Step 3: Prepare the Data
Step 4: Perform ANOVA
Step 5: Calculate the F-statistic and P-value
Step 6: Print the result
Step 7: Stop the process

PROGRAM:
import numpy as np
import [Link] as stats
group_1 = [Link]([23, 45, 67, 32, 45, 34, 43, 45, 56, 42])
group_2 = [Link]([45, 32, 23, 43, 46, 32, 21, 22, 43, 43])
group_3 = [Link]([65, 78, 56, 67, 82, 73, 74, 65, 68, 74])
f_stat, p_value = stats.f_oneway(group_1, group_2, group_3)
print(f"F-statistic: {f_stat}")
print(f"P-value: {p_value}")
if p_value < 0.05:
print("There is a significant difference between the group means.")
else:
print("There is no significant difference between the group means.")
OUTPUT:
RESULT:
Exp No : 09 BUILDING AND VALIDATING
Date : LINEAR MODELS

AIM:

ALGORITHM:
Step 1: Start the Program
Step 2: Import package
Step 3: Prepare the Data
Step 4: Build the Model
Step 5: Evaluate the Model
Step 6: Model Diagnostics
Step 7: Print the result
Step 8: Stop the process

PROGRAM:
import numpy as np
import pandas as pd
import [Link] as sm
from sklearn.model_selection import train_test_split
from [Link] import mean_squared_error, r2_score
[Link](0)
X = [Link](100, 1) * 10
y = 2.5 * [Link]() + [Link](100) * 2
OUTPUT:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
X_train_sm = sm.add_constant(X_train)
X_test_sm = sm.add_constant(X_test)
model = [Link](y_train, X_train_sm).fit()
y_pred = [Link](X_test_sm)
print([Link]())
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
print(f'R-squared: {r2}')

RESULT:
Exp No : 10 BUILDING AND VALIDATING
Date : LOGISTIC MODELS

AIM:

ALGORITHM:
Step 1: Start the Program
Step 2: Import python libraries
Step 3: Generate synthetic data
Step 4: Split the data
Step 5: Build the logistic regression model
Step 6: Make predictions and Evaluate the model
Step 7: Print evaluation metrics and Print the result
Step 8: Stop the process

PROGRAM:
import numpy as np
import pandas as pd
import [Link] as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from [Link] import accuracy_score, confusion_matrix,
classification_report [Link](0)
X = [Link](100, 2)
y = (X[:, 0] + X[:, 1] > 1).astype(int)
OUTPUT:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
model = LogisticRegression()
[Link](X_train, y_train)
y_pred = [Link](X_test)
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
print(f'Accuracy: {accuracy}') print('Confusion Matrix:')
print(conf_matrix) print('Classification Report:')
print(class_report)
[Link](figsize=(10, 6))
[Link](X_test[:, 0], X_test[:, 1], c=y_test, cmap='coolwarm', edgecolors='k',
s=100, label='True Labels')
[Link](X_test[:, 0], X_test[:, 1], c=y_pred, marker='x', cmap='coolwarm',
s=100, label='Predicted Labels')
[Link]('Logistic Regression Predictions')
[Link]('Feature 1')
[Link]('Feature 2')
[Link]()
[Link]()

RESULT:
Exp No : 11
TIME SERIES ANALYSIS
Date :

AIM:

ALGORITHM:
Step 1: Start the Program
Step 2: Import python libraries
Step 3: Generate a time series data
Step 4: Create a DataFrame
Step 5: Print the result
Step 6: Stop the process

PROGRAM:
import [Link] as plt
import pandas as pd
import numpy as np
date_range = pd.date_range(start='1/1/2020', periods=100)
data = [Link](100).cumsum()
time_series_data = [Link](data, index=date_range, columns=['Value'])
[Link](figsize=(12, 6))
[Link](time_series_data.index, time_series_data['Value'], label='Random
Data', color='blue')
[Link]('Time Series Analysis')
[Link]('Date')
OUTPUT:
[Link]('Value')
[Link]()
[Link]()
[Link]()

RESULT:

You might also like