Data Analytics using Python NEP V Semester BCA
1. Calculate simple probability using python.
#Calculate probabilty
#In a deck of 52 cards, Ace=4,face-cards(jack,queen,king)=3*4.
def event_probability(event_outcomes, sample_space):
probability = (event_outcomes / sample_space) * 100
return round(probability, 1)
# Sample Space
cards = 52
# Determine the probability of drawing a heart
hearts = 13
heart_probability = event_probability(hearts, cards)
# Determine the probability of drawing a face card
face_cards = 12
face_card_probability = event_probability(face_cards, cards)
# Determine the probability of drawing the queen of hearts
queen_of_hearts = 1
queen_of_hearts_probability = event_probability(queen_of_hearts, cards)
#number of each card type
print("~~~~Number of each card~~~~")
print()
print("In the deck of" , cards , "cards, number of hearts present is:", hearts)
print("In the deck of" , cards , "cards, number of face-cards present is:", face_cards)
print("In the deck of" , cards , "cards, number of queen_of_hearts present is:", queen_of_hearts)
# Print each probability
print()
print("~~~~Probability of drawing cards~~~~")
print()
print('Chance of drawing heart is :' + str(heart_probability) + '%')
print('Chance of drawing face-card is :'+ str(face_card_probability) + '%')
print('Chance of drawing queen-of-hearts is :'+ str(queen_of_hearts_probability) + '%')
Output:
G:\gfgctd\5thBCA(NEP)\DataAnalytics\DAdataset\2.practDA1\DA-Python-
Rashmi\Simple_probability.py
~~~~Number of each card~~~~
In the deck of 52 cards, number of hearts present is: 13
In the deck of 52 cards, number of face-cards present is: 12
In the deck of 52 cards, number of queen_of_hearts present is: 1
~~~~Probability of drawing cards~~~~
Chance of drawing heart is :25.0%
Chance of drawing face-card is :23.1%
Chance of drawing queen-of-hearts is :1.9%
Dr. RASHMI M, Dept. of Computer Science, GFGCTD.
Data Analytics using Python NEP V Semester BCA
2. Write a python program for real- life application of probability distribution. Consider Exam
scores by a class with mean=70 and standard deviation=10.
import numpy as np
import [Link] as plt
# Generate exam scores with mean=70 and standard deviation=10
exam_scores = [Link](70, 10, size=20)
print(exam_scores)
# Calculate the percentage of students scoring above 80
students_above_80 = [Link](exam_scores > 80) / len(exam_scores) * 100
# Calculate the percentage of students scoring above 70
students_above_70 = [Link](exam_scores > 70) / len(exam_scores) * 100
# Print the percentage
print("Percentage of students scoring above 80:", students_above_80, "%")
print("Percentage of students scoring above 70:", students_above_70, "%")
# Create a bar chart
[Link](range(len(exam_scores)), exam_scores) # x-axis is index, y-axis is score
# Customize the chart
[Link]("Student")
[Link]("Exam Score")
[Link]("Distribution of Exam Scores")
# Show the chart
[Link]()
Output:
G:\gfgctd\5thBCA(NEP)\DataAnalytics\DAdataset\2.practDA1\DA-Python-
Rashmi\probability_distribution.py
[68.86172681 73.59135747 60.11343059 74.89978041 66.02314038 48.28600854
70.87214828 70.98672986 81.98453173 72.42196246 55.78512137 61.13897005
93.96274937 80.12409655 73.62671301 70.35841052 80.06695564 78.38647344
63.38810889 73.5667196 ]
Percentage of students scoring above 80: 20.0 %
Percentage of students scoring above 70: 65.0 %
Dr. RASHMI M, Dept. of Computer Science, GFGCTD.
Data Analytics using Python NEP V Semester BCA
3. Write a python program to calculate correlation co-efficient and draw scattered plot.
import [Link] as plt
import math as mt
# Define the dataset (you can replace this with your own data)
data_x = [18,17,26,19,27,31,14,29,32,26]#Experince in months
data_y = [16000,11000,23000,23000,23000,32000,15000,33000,32000,32000]#Salary
#dataset to get correlation=1
#data_x = [1,2,3,4,5,6,7,8,9,10]
#data_y = [1,2,3,4,5,6,7,8,9,10]
# Calculate the means of x and y
mean_x = sum(data_x) / len(data_x)
mean_y = sum(data_y) / len(data_y)
# Calculate the numerator and denominators
numerator = sum((data_x[i] - mean_x) * (data_y[i] - mean_y) for i in range(len(data_x)))
denominator_x = sum((data_x[i] - mean_x)**2 for i in range(len(data_x)))
denominator_y = sum((data_y[i] - mean_y)**2 for i in range(len(data_y)))
# Calculate the correlation coefficient 'r'
#r = (len(data_x)*numerator) / ( ((denominator_x ** 0.5)*[Link](len(data_x))) * ((denominator_y
** 0.5) * [Link](len(data_x)) ))
Dr. RASHMI M, Dept. of Computer Science, GFGCTD.
Data Analytics using Python NEP V Semester BCA
#or simply you can use
r = numerator / ( (denominator_x ** 0.5) * (denominator_y ** 0.5) )
# Print the correlation coefficient
print("Correlation Coefficient (r):", r)
if r>0:
print("The obtained correlation is positive correlation")
elif r<0:
print("The obtained correlation is negative correlation")
else:
printf("The obtained correlation is zero correlation")
# Create a scatter plot of the data
[Link](data_x, data_y, label="Experience Salary Dataset")
[Link]("Experience (in Months)")
[Link]("Salary per month")
# Add a line representing the correlation coefficient
x_line = [min(data_x), max(data_x)]
y_line = [min(data_y), max(data_y)]
[Link](x_line, y_line, color='red', label="Correlation Coefficient Line")
# Display the legend
[Link]()
# Show the plot
[Link]("Scatter Plot with Correlation Coefficient Line")
[Link](True)
[Link]()
Output:
G:\gfgctd\5thBCA(NEP)\DataAnalytics\DAdataset\2.practDA1\DA-Python-Rashmi\[Link]
Correlation Coefficient (r): 0.8836309503669456
The obtained correlation is positive correlation
Dr. RASHMI M, Dept. of Computer Science, GFGCTD.
Data Analytics using Python NEP V Semester BCA
4. Write Python program to calculate linear regression.
import numpy as np
import [Link] as plt
X = [Link]([18,17,26,19,27,31,14,29,32,26])#Experince in months
Y = [Link]([16000,11000,23000,23000,23000,32000,15000,33000,32000,32000])#Salary
mean_x = [Link](X)
mean_y = [Link](Y)
variance_x = [Link](X)
covariance = ([Link]((X - mean_x) * (Y - mean_y)))/(len(X))
a = covariance / variance_x
b = mean_y - a * mean_x
Y_pred = a * X + b
print(f"Regression Line: Y = {a:.2f} + {b:.2f}X")
print("Y- values are = " , Y_pred )
print("For corresponding X- values=" , X)
# Plotting the Data
[Link](X, Y, label="Original Data")
[Link](X, Y_pred, color="red", label=f"Regression Line: Y = {a:.2f} + {b:.2f}X")
Dr. RASHMI M, Dept. of Computer Science, GFGCTD.
Data Analytics using Python NEP V Semester BCA
[Link]("Experince")
[Link]("Salary")
[Link]()
[Link](True)
[Link]()
# Getting the Solution for new data set
new_X = 7.5
new_Y = a * new_X + b
print()
print(f"Predicted Y-value = {a:.2f} + {b:.2f}X for X= {new_X} ")
print(f"Predicted Y-value = {new_Y:.2f} ")
Output:
G:/gfgctd/5thBCA(NEP)/DataAnalytics/DAdataset/2.practDA1/DA-Python-
Rashmi/Linear_Regression.py
Regression Line: Y = 1123.60 + -2853.93X
Y- values are = [17370.78651685 16247.19101124 26359.5505618 18494.38202247
27483.14606742 31977.52808989 12876.40449438 29730.33707865
33101.12359551 26359.5505618 ]
For corresponding X- values= [18 17 26 19 27 31 14 29 32 26]
Predicted Y-value = 1123.60 + -2853.93X for X= 7.5
Predicted Y-value = 5573.03
Dr. RASHMI M, Dept. of Computer Science, GFGCTD.
Data Analytics using Python NEP V Semester BCA
5. Write a python program to perform Logistic Regression. Note: use dataset from
“[Link]” file to show logistic regression.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from [Link] import accuracy_score, classification_report, confusion_matrix
# Load the dataset from the CSV file
# Replace 'your_dataset.csv' with the actual file path or URL of your CSV file
file_path = '[Link]'
data = pd.read_csv(file_path)
# Display the first few rows of the dataset
print("Dataset:")
print([Link]())
# Assume the last column is the target variable (diabetes or not)
X = [Link][:, :-1] # Features (all columns except the last one)
y = [Link][:, -1] # Target variable (last column)
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Dr. RASHMI M, Dept. of Computer Science, GFGCTD.
Data Analytics using Python NEP V Semester BCA
# Create a logistic regression model
model = LogisticRegression()
# Train the model on the training set
[Link](X_train, y_train)
# Make predictions on the test set
y_pred = [Link](X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
# Display the results
print("\nModel Evaluation:")
print(f"Accuracy: {accuracy:.2f}")
print("\nConfusion Matrix:")
print(conf_matrix)
print("\nClassification Report:")
print(class_report)
Output:
G:/gfgctd/5thBCA(NEP)/DataAnalytics/DAdataset/2.practDA1/DA-Python-
Rashmi/[Link]
Dataset:
Pregnancies Glucose BloodPressure ... DiabetesPedigreeFunction Age Outcome
0 6 148 72 ... 0.627 50 1
1 1 85 66 ... 0.351 31 0
2 8 183 64 ... 0.672 32 1
3 1 89 66 ... 0.167 21 0
4 0 137 40 ... 2.288 33 1
[5 rows x 9 columns]
Model Evaluation:
Accuracy: 0.75
Confusion Matrix:
[[78 21]
[18 37]]
Dr. RASHMI M, Dept. of Computer Science, GFGCTD.
Data Analytics using Python NEP V Semester BCA
Classification Report:
precision recall f1-score support
0 0.81 0.79 0.80 99
1 0.64 0.67 0.65 55
accuracy 0.75 154
macro avg 0.73 0.73 0.73 154
weighted avg 0.75 0.75 0.75 154
Dr. RASHMI M, Dept. of Computer Science, GFGCTD.