ANNA UNIVERSITY
REGIONAL CAMPUS, COIMBATORE
LABORATORY RECORD
2024-2025
NAME :
REGISTER NUMBER :
BRANCH : [Link] – ARTIFICIAL INTELLIGENCE
AND DATA SCIENCE
: AD3411
SUBJECT CODE
SUBJECT TITLE : DATA SCIENCE AND ANALYTICS
LABORATORY
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
ANNA UNIVERSITY REGIONAL CAMPUS
COIMBATORE 641 046
ANNA UNIVERSITY
REGIONAL CAMPUS, COIMBATORE
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
BONAFIDE CERTIFICATE
Certified that this is the Bonafide record of Practical done in AD3411 DATA SCIENCE
AND ANALYTICS LABORATORY by………………………………………………….
Roll No………………………… Second Year/Fourth Semester during 2024-2025.
STAFF IN-CHARGE HEAD OF THE DEPARTMENT
University Register Number :
Submitted for the University Practical Examination Held on
INTERNAL EXAMINER EXTERNAL EXAMINER
INDEX
EX DATE TITLE PAGE MARKS SIGN
NO. NO.
INDEX
EX DATE TITLE PAGE MARKS SIGN
NO. NO.
Exp No : 01
WORKING WITH PANDAS
Date : DATAFRAME
AIM:
ALGORITHM:
Step 1: Start the Program
Step2: import pandas module
Step 3: Create a dataframe using the dictionary
Step 4: Print the output
Step 5: Stop the process
PROGRAM:
import pandas as pd
data = { 'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago'] }
df = [Link](data)
print([Link]())
filtered_df = df[df['Age'] > 30]
print(filtered_df)
df['Senior'] = df['Age'] > 30
print(df)
grouped_df = [Link]('City')['Age'].mean()
print(grouped_df)
OUTPUT:
RESULT:
OUTPUT:
Exp No : 02
BASIC PLOTS USING MATPLOTLIB
Date :
AIM:
ALGORITHM:
Step 1: Start the Program
Step 2: import Matplotlib module
Step 3: Create a Basic plots using Matplotlib
Step 4: Print the output
Step 5: Stop the process
PROGRAM:
import [Link] as plt
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
[Link](x, y, color='green', linestyle='--', marker='o', markersize=10)
[Link]('X-axis')
[Link]('Y-axis')
[Link]('Customized Line Plot')
[Link]()
RESULT:
OUTPUT:
Exp No : 03 A
FREQUENCY DISTRIBUTION
Date :
AIM:
ALGORITHM:
Step 1: Start the Program
Step 2: Import the python library modules
Step 3: Write the code the frequency distribution
Step 4: Print the result
Step 5: Stop the process
PROGRAM:
import pandas as pd
data = [1, 2, 2, 3, 4, 4, 4, 5, 6, 7, 7, 7, 7]
series = [Link](data)
frequency = series.value_counts()
print(frequency)
RESULT:
Exp No : 03 B
AVERAGES
Date :
AIM:
ALGORITHM:
Step 1: Start the Program
Step 2: Import the python library modules
Step 3: Write the code the average
Step 4: Print the result
Step 5: Stop the process
PROGRAM:
import numpy as np
import pandas as pd
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
mean_pandas = [Link](data).mean()
print(f"Mean (Pandas): {mean_pandas}")
mean_numpy = [Link](data)
print(f"Mean (NumPy): {mean_numpy}")
median = [Link](data).median()
print(f"Median: {median}")
mode = [Link](data).mode()
print(f"Mode: {mode}")
OUTPUT:
RESULT:
Exp No : 03 C
VARIABILITY
Date :
AIM:
ALGORITHM:
Step 1: Start the Program
Step 2: Import the python library modules
Step 3: Write the code the variability
Step 4: Print the result
Step 5: Stop the process
PROGRAM:
import numpy as np
import pandas as pd
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
range_value = max(data) - min(data)
print(f"Range: {range_value}")
variance_pandas = [Link](data).var()
print(f"Variance (Pandas): {variance_pandas}")
std_dev_pandas = [Link](data).std()
print(f"Standard Deviation (Pandas): {std_dev_pandas}")
variance_numpy = [Link](data)
print(f"Variance (NumPy): {variance_numpy}")
std_dev_numpy = [Link](data)
print(f"Standard Deviation (NumPy): {std_dev_numpy}")
OUTPUT:
RESULT:
Exp No : 04 A
NORMAL CURVES
Date :
AIM:
ALGORITHM:
Step 1: Start the Program
Step 2: Import packages numpy and matplotlib
Step 3: Create the distribution
Step 4: Visualizing the distribution
Step 5: Stop the process
PROGRAM:
import numpy as np
import [Link] as plt
mu = 0
sigma = 1
data = [Link](mu, sigma, 1000)
[Link](data, bins=30, density=True, alpha=0.6, color='g')
xmin, xmax = [Link]() x = [Link](xmin, xmax, 100)
p = 1/(sigma * [Link](2 * [Link])) * [Link](-0.5 * ((x - mu) /
sigma)**2) [Link](x, p, 'k', linewidth=2)
[Link]('Data values')
[Link]('Probability density')
[Link]('Normal Distribution Curve')
[Link]()
OUTPUT:
RESULT:
Exp No : 04 B
CORRELATION AND SCATTER
Date : PLOTS
AIM:
ALGORITHM:
Step 1: Start the Program
Step 2: Create variable y1, y2
Step 3: Create variable x, y3 using random function
Step 4: Plot the scatter plot
Step 5: Print the result
Step 6: Stop the process
PROGRAM:
import [Link] as plt
import numpy as np
x = [Link](100)
y = 2 * x + [Link](0, 0.1, 100)
[Link](x, y, alpha=0.7)
[Link]('X')
[Link]('Y')
[Link]('Scatter Plot: X vs Y')
[Link]()
OUTPUT:
RESULT:
Exp No : 04 C
CORRELATION COEFFICIENT
Date :
AIM:
ALGORITHM:
Step 1: Start the Program
Step 2: Import math package
Step 3: Define correlation coefficient function
Step 4: Calculate correlation using formula
Step 5: Print the result
Step 6: Stop the process
PROGRAM:
import numpy as np
import pandas as pd
x = [Link](100)
y = 2 * x + [Link](0, 0.1, 100)
correlation_numpy = [Link](x, y)[0, 1]
print(f"Correlation Coefficient (NumPy): {correlation_numpy}")
df = [Link]({'x': x, 'y': y})
correlation_pandas = [Link]().loc['x', 'y']
print(f"Correlation Coefficient (Pandas): {correlation_pandas}")
OUTPUT:
RESULT:
Exp No : 05
SIMPLE LINEAR REGRESSION
Date :
AIM:
ALGORITHM:
Step 1: Start the Program
Step 2: Import numpy and matplotlib package
Step 3: Define coefficient function
Step 4: Calculate cross-deviation and deviation about x
Step 5: Calculate regression coefficients
Step 6: Plot the Linear regression and define main function
Step 7: Print the result
Step 8: Stop the process
PROGRAM:
import numpy as np
import [Link] as plt
from sklearn.linear_model import LinearRegression
[Link](0)
X = [Link](100) * 10
Y = 2.5 * X + [Link](0, 2, 100)
[Link](X, Y, color='blue', alpha=0.7)
[Link]('X')
[Link]('Y')
OUTPUT:
[Link]('Scatter Plot: X vs Y')
[Link]()
X = [Link](-1, 1)
model = LinearRegression()
[Link](X, Y)
slope = model.coef_[0]
intercept = model.intercept_
print(f"Slope (beta_1): {slope}")
print(f"Intercept (beta_0): {intercept}")
Y_pred = [Link](X)
[Link](X, Y, color='blue', alpha=0.7, label='Data')
[Link](X, Y_pred, color='red', label='Fitted Line')
[Link]('X') [Link]('Y')
[Link]('Simple Linear Regression: Fitted Line')
[Link]()
[Link]()
r_squared = [Link](X, Y)
print(f"R-squared: {r_squared}")
X_new = [Link]([[15]])
Y_new = [Link](X_new)
print(f"Predicted Y for X = 15: {Y_new[0]}")
RESULT:
Exp No : 06
Z-TEST
Date :
AIM:
ALGORITHM:
Step 1: Start the Program
Step 2: Import math package
Step 3: Define Z-test function
Step 4: Calculate Z-test using formula
Step 5: Print the result
Step 6: Stop the process
PROGRAM:
import numpy as np
import [Link] as stats
mean_1 = 50
mean_2 = 45
std_1 = 10
std_2 = 12
size_1 = 40
size_2 = 35
z_score_two_sample = (mean_1 - mean_2) / [Link]((std_1**2 / size_1) +
(std_2**2 / size_2))
p_value_two_sample = 2 * (1 - [Link](abs(z_score_two_sample)))
print(f"Z-Score: {z_score_two_sample}")
print(f"P-value: {p_value_two_sample}")
OUTPUT:
RESULT:
Exp No : 07
T-TEST
Date :
AIM:
ALGORITHM:
Step 1: Start the Program
Step 2: Import math package
Step 3: Define T-test function
Step 4: Calculate T-test using formula
Step 5: Print the result
Step 6: Stop the process
PROGRAM:
import [Link] as stats
import numpy as np
sample_data = [Link]([52, 55, 48, 49, 53, 54, 51, 50, 55, 58, 56, 57, 52, 51,
54, 53, 59, 61, 50, 52, 54, 53, 49, 47, 52, 51, 50, 48, 56, 55])
population_mean = 50
t_stat, p_value = stats.ttest_1samp(sample_data, population_mean)
print(f"T-statistic: {t_stat}")
print(f"P-value: {p_value}")
OUTPUT:
RESULT:
Exp No : 08
ANOVA
Date :
AIM:
ALGORITHM:
Step 1: Start the Program
Step 2: Import package
Step 3: Prepare the Data
Step 4: Perform ANOVA
Step 5: Calculate the F-statistic and P-value
Step 6: Print the result
Step 7: Stop the process
PROGRAM:
import numpy as np
import [Link] as stats
group_1 = [Link]([23, 45, 67, 32, 45, 34, 43, 45, 56, 42])
group_2 = [Link]([45, 32, 23, 43, 46, 32, 21, 22, 43, 43])
group_3 = [Link]([65, 78, 56, 67, 82, 73, 74, 65, 68, 74])
f_stat, p_value = stats.f_oneway(group_1, group_2, group_3)
print(f"F-statistic: {f_stat}")
print(f"P-value: {p_value}")
if p_value < 0.05:
print("There is a significant difference between the group means.")
else:
print("There is no significant difference between the group means.")
OUTPUT:
RESULT:
Exp No : 09 BUILDING AND VALIDATING
Date : LINEAR MODELS
AIM:
ALGORITHM:
Step 1: Start the Program
Step 2: Import package
Step 3: Prepare the Data
Step 4: Build the Model
Step 5: Evaluate the Model
Step 6: Model Diagnostics
Step 7: Print the result
Step 8: Stop the process
PROGRAM:
import numpy as np
import pandas as pd
import [Link] as sm
from sklearn.model_selection import train_test_split
from [Link] import mean_squared_error, r2_score
[Link](0)
X = [Link](100, 1) * 10
y = 2.5 * [Link]() + [Link](100) * 2
OUTPUT:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
X_train_sm = sm.add_constant(X_train)
X_test_sm = sm.add_constant(X_test)
model = [Link](y_train, X_train_sm).fit()
y_pred = [Link](X_test_sm)
print([Link]())
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
print(f'R-squared: {r2}')
RESULT:
Exp No : 10 BUILDING AND VALIDATING
Date : LOGISTIC MODELS
AIM:
ALGORITHM:
Step 1: Start the Program
Step 2: Import python libraries
Step 3: Generate synthetic data
Step 4: Split the data
Step 5: Build the logistic regression model
Step 6: Make predictions and Evaluate the model
Step 7: Print evaluation metrics and Print the result
Step 8: Stop the process
PROGRAM:
import numpy as np
import pandas as pd
import [Link] as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from [Link] import accuracy_score, confusion_matrix,
classification_report [Link](0)
X = [Link](100, 2)
y = (X[:, 0] + X[:, 1] > 1).astype(int)
OUTPUT:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
model = LogisticRegression()
[Link](X_train, y_train)
y_pred = [Link](X_test)
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
print(f'Accuracy: {accuracy}') print('Confusion Matrix:')
print(conf_matrix) print('Classification Report:')
print(class_report)
[Link](figsize=(10, 6))
[Link](X_test[:, 0], X_test[:, 1], c=y_test, cmap='coolwarm', edgecolors='k',
s=100, label='True Labels')
[Link](X_test[:, 0], X_test[:, 1], c=y_pred, marker='x', cmap='coolwarm',
s=100, label='Predicted Labels')
[Link]('Logistic Regression Predictions')
[Link]('Feature 1')
[Link]('Feature 2')
[Link]()
[Link]()
RESULT:
Exp No : 11
TIME SERIES ANALYSIS
Date :
AIM:
ALGORITHM:
Step 1: Start the Program
Step 2: Import python libraries
Step 3: Generate a time series data
Step 4: Create a DataFrame
Step 5: Print the result
Step 6: Stop the process
PROGRAM:
import [Link] as plt
import pandas as pd
import numpy as np
date_range = pd.date_range(start='1/1/2020', periods=100)
data = [Link](100).cumsum()
time_series_data = [Link](data, index=date_range, columns=['Value'])
[Link](figsize=(12, 6))
[Link](time_series_data.index, time_series_data['Value'], label='Random
Data', color='blue')
[Link]('Time Series Analysis')
[Link]('Date')
OUTPUT:
[Link]('Value')
[Link]()
[Link]()
[Link]()
RESULT: