0% found this document useful (0 votes)
2 views9 pages

Experience-Salary Correlation Analysis

The document contains Python programs for various data analytics applications, including calculating probabilities with a deck of cards, generating exam score distributions, calculating correlation coefficients, performing linear regression, and executing logistic regression using a diabetes dataset. Each section includes code snippets, outputs, and explanations of the results. The document is intended for BCA students studying data analytics.

Uploaded by

jajssjjiiwi
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views9 pages

Experience-Salary Correlation Analysis

The document contains Python programs for various data analytics applications, including calculating probabilities with a deck of cards, generating exam score distributions, calculating correlation coefficients, performing linear regression, and executing logistic regression using a diabetes dataset. Each section includes code snippets, outputs, and explanations of the results. The document is intended for BCA students studying data analytics.

Uploaded by

jajssjjiiwi
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Data Analytics using Python NEP V Semester BCA

1. Calculate simple probability using python.

#Calculate probabilty
#In a deck of 52 cards, Ace=4,face-cards(jack,queen,king)=3*4.
def event_probability(event_outcomes, sample_space):
probability = (event_outcomes / sample_space) * 100
return round(probability, 1)

# Sample Space
cards = 52
# Determine the probability of drawing a heart
hearts = 13
heart_probability = event_probability(hearts, cards)

# Determine the probability of drawing a face card


face_cards = 12
face_card_probability = event_probability(face_cards, cards)

# Determine the probability of drawing the queen of hearts


queen_of_hearts = 1
queen_of_hearts_probability = event_probability(queen_of_hearts, cards)

#number of each card type


print("~~~~Number of each card~~~~")
print()
print("In the deck of" , cards , "cards, number of hearts present is:", hearts)
print("In the deck of" , cards , "cards, number of face-cards present is:", face_cards)
print("In the deck of" , cards , "cards, number of queen_of_hearts present is:", queen_of_hearts)

# Print each probability


print()
print("~~~~Probability of drawing cards~~~~")
print()
print('Chance of drawing heart is :' + str(heart_probability) + '%')
print('Chance of drawing face-card is :'+ str(face_card_probability) + '%')
print('Chance of drawing queen-of-hearts is :'+ str(queen_of_hearts_probability) + '%')

Output:

G:\gfgctd\5thBCA(NEP)\DataAnalytics\DAdataset\2.practDA1\DA-Python-
Rashmi\Simple_probability.py
~~~~Number of each card~~~~

In the deck of 52 cards, number of hearts present is: 13


In the deck of 52 cards, number of face-cards present is: 12
In the deck of 52 cards, number of queen_of_hearts present is: 1

~~~~Probability of drawing cards~~~~

Chance of drawing heart is :25.0%


Chance of drawing face-card is :23.1%
Chance of drawing queen-of-hearts is :1.9%

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.


Data Analytics using Python NEP V Semester BCA

2. Write a python program for real- life application of probability distribution. Consider Exam
scores by a class with mean=70 and standard deviation=10.

import numpy as np
import [Link] as plt

# Generate exam scores with mean=70 and standard deviation=10


exam_scores = [Link](70, 10, size=20)
print(exam_scores)
# Calculate the percentage of students scoring above 80
students_above_80 = [Link](exam_scores > 80) / len(exam_scores) * 100

# Calculate the percentage of students scoring above 70


students_above_70 = [Link](exam_scores > 70) / len(exam_scores) * 100

# Print the percentage


print("Percentage of students scoring above 80:", students_above_80, "%")
print("Percentage of students scoring above 70:", students_above_70, "%")

# Create a bar chart


[Link](range(len(exam_scores)), exam_scores) # x-axis is index, y-axis is score

# Customize the chart


[Link]("Student")
[Link]("Exam Score")
[Link]("Distribution of Exam Scores")

# Show the chart


[Link]()

Output:

G:\gfgctd\5thBCA(NEP)\DataAnalytics\DAdataset\2.practDA1\DA-Python-
Rashmi\probability_distribution.py
[68.86172681 73.59135747 60.11343059 74.89978041 66.02314038 48.28600854
70.87214828 70.98672986 81.98453173 72.42196246 55.78512137 61.13897005
93.96274937 80.12409655 73.62671301 70.35841052 80.06695564 78.38647344
63.38810889 73.5667196 ]
Percentage of students scoring above 80: 20.0 %
Percentage of students scoring above 70: 65.0 %

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.


Data Analytics using Python NEP V Semester BCA

3. Write a python program to calculate correlation co-efficient and draw scattered plot.

import [Link] as plt


import math as mt

# Define the dataset (you can replace this with your own data)
data_x = [18,17,26,19,27,31,14,29,32,26]#Experince in months
data_y = [16000,11000,23000,23000,23000,32000,15000,33000,32000,32000]#Salary

#dataset to get correlation=1


#data_x = [1,2,3,4,5,6,7,8,9,10]
#data_y = [1,2,3,4,5,6,7,8,9,10]

# Calculate the means of x and y


mean_x = sum(data_x) / len(data_x)
mean_y = sum(data_y) / len(data_y)

# Calculate the numerator and denominators


numerator = sum((data_x[i] - mean_x) * (data_y[i] - mean_y) for i in range(len(data_x)))
denominator_x = sum((data_x[i] - mean_x)**2 for i in range(len(data_x)))
denominator_y = sum((data_y[i] - mean_y)**2 for i in range(len(data_y)))

# Calculate the correlation coefficient 'r'


#r = (len(data_x)*numerator) / ( ((denominator_x ** 0.5)*[Link](len(data_x))) * ((denominator_y
** 0.5) * [Link](len(data_x)) ))

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.


Data Analytics using Python NEP V Semester BCA

#or simply you can use


r = numerator / ( (denominator_x ** 0.5) * (denominator_y ** 0.5) )

# Print the correlation coefficient


print("Correlation Coefficient (r):", r)
if r>0:
print("The obtained correlation is positive correlation")
elif r<0:
print("The obtained correlation is negative correlation")
else:
printf("The obtained correlation is zero correlation")

# Create a scatter plot of the data


[Link](data_x, data_y, label="Experience Salary Dataset")
[Link]("Experience (in Months)")
[Link]("Salary per month")

# Add a line representing the correlation coefficient


x_line = [min(data_x), max(data_x)]
y_line = [min(data_y), max(data_y)]
[Link](x_line, y_line, color='red', label="Correlation Coefficient Line")

# Display the legend


[Link]()

# Show the plot


[Link]("Scatter Plot with Correlation Coefficient Line")
[Link](True)
[Link]()

Output:

G:\gfgctd\5thBCA(NEP)\DataAnalytics\DAdataset\2.practDA1\DA-Python-Rashmi\[Link]
Correlation Coefficient (r): 0.8836309503669456
The obtained correlation is positive correlation

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.


Data Analytics using Python NEP V Semester BCA

4. Write Python program to calculate linear regression.

import numpy as np
import [Link] as plt

X = [Link]([18,17,26,19,27,31,14,29,32,26])#Experince in months
Y = [Link]([16000,11000,23000,23000,23000,32000,15000,33000,32000,32000])#Salary

mean_x = [Link](X)
mean_y = [Link](Y)
variance_x = [Link](X)
covariance = ([Link]((X - mean_x) * (Y - mean_y)))/(len(X))
a = covariance / variance_x
b = mean_y - a * mean_x
Y_pred = a * X + b
print(f"Regression Line: Y = {a:.2f} + {b:.2f}X")
print("Y- values are = " , Y_pred )
print("For corresponding X- values=" , X)
# Plotting the Data
[Link](X, Y, label="Original Data")
[Link](X, Y_pred, color="red", label=f"Regression Line: Y = {a:.2f} + {b:.2f}X")

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.


Data Analytics using Python NEP V Semester BCA

[Link]("Experince")
[Link]("Salary")
[Link]()
[Link](True)
[Link]()

# Getting the Solution for new data set


new_X = 7.5
new_Y = a * new_X + b
print()
print(f"Predicted Y-value = {a:.2f} + {b:.2f}X for X= {new_X} ")
print(f"Predicted Y-value = {new_Y:.2f} ")

Output:

G:/gfgctd/5thBCA(NEP)/DataAnalytics/DAdataset/2.practDA1/DA-Python-
Rashmi/Linear_Regression.py
Regression Line: Y = 1123.60 + -2853.93X
Y- values are = [17370.78651685 16247.19101124 26359.5505618 18494.38202247
27483.14606742 31977.52808989 12876.40449438 29730.33707865
33101.12359551 26359.5505618 ]
For corresponding X- values= [18 17 26 19 27 31 14 29 32 26]

Predicted Y-value = 1123.60 + -2853.93X for X= 7.5


Predicted Y-value = 5573.03

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.


Data Analytics using Python NEP V Semester BCA

5. Write a python program to perform Logistic Regression. Note: use dataset from
“[Link]” file to show logistic regression.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from [Link] import accuracy_score, classification_report, confusion_matrix

# Load the dataset from the CSV file


# Replace 'your_dataset.csv' with the actual file path or URL of your CSV file
file_path = '[Link]'
data = pd.read_csv(file_path)

# Display the first few rows of the dataset


print("Dataset:")
print([Link]())

# Assume the last column is the target variable (diabetes or not)


X = [Link][:, :-1] # Features (all columns except the last one)
y = [Link][:, -1] # Target variable (last column)

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.


Data Analytics using Python NEP V Semester BCA

# Create a logistic regression model


model = LogisticRegression()

# Train the model on the training set


[Link](X_train, y_train)

# Make predictions on the test set


y_pred = [Link](X_test)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

# Display the results


print("\nModel Evaluation:")
print(f"Accuracy: {accuracy:.2f}")
print("\nConfusion Matrix:")
print(conf_matrix)
print("\nClassification Report:")
print(class_report)

Output:

G:/gfgctd/5thBCA(NEP)/DataAnalytics/DAdataset/2.practDA1/DA-Python-
Rashmi/[Link]
Dataset:
Pregnancies Glucose BloodPressure ... DiabetesPedigreeFunction Age Outcome
0 6 148 72 ... 0.627 50 1
1 1 85 66 ... 0.351 31 0
2 8 183 64 ... 0.672 32 1
3 1 89 66 ... 0.167 21 0
4 0 137 40 ... 2.288 33 1

[5 rows x 9 columns]

Model Evaluation:
Accuracy: 0.75

Confusion Matrix:
[[78 21]
[18 37]]

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.


Data Analytics using Python NEP V Semester BCA

Classification Report:
precision recall f1-score support

0 0.81 0.79 0.80 99


1 0.64 0.67 0.65 55

accuracy 0.75 154


macro avg 0.73 0.73 0.73 154
weighted avg 0.75 0.75 0.75 154

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.

You might also like