0% found this document useful (0 votes)

2 views9 pages

Experience-Salary Correlation Analysis

The document contains Python programs for various data analytics applications, including calculating probabilities with a deck of cards, generating exam score distributions, calculating correlation coefficients, performing linear regression, and executing logistic regression using a diabetes dataset. Each section includes code snippets, outputs, and explanations of the results. The document is intended for BCA students studying data analytics.

Uploaded by

jajssjjiiwi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views9 pages

Experience-Salary Correlation Analysis

Uploaded by

jajssjjiiwi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data Analytics using Python NEP V Semester BCA

1. Calculate simple probability using python.

#Calculate probabilty
#In a deck of 52 cards, Ace=4,face-cards(jack,queen,king)=3*4.
def event_probability(event_outcomes, sample_space):
probability = (event_outcomes / sample_space) * 100
return round(probability, 1)

# Sample Space
cards = 52
# Determine the probability of drawing a heart
hearts = 13
heart_probability = event_probability(hearts, cards)

# Determine the probability of drawing a face card

face_cards = 12
face_card_probability = event_probability(face_cards, cards)

# Determine the probability of drawing the queen of hearts

queen_of_hearts = 1
queen_of_hearts_probability = event_probability(queen_of_hearts, cards)

#number of each card type

print("~~~~Number of each card~~~~")
print()
print("In the deck of" , cards , "cards, number of hearts present is:", hearts)
print("In the deck of" , cards , "cards, number of face-cards present is:", face_cards)
print("In the deck of" , cards , "cards, number of queen_of_hearts present is:", queen_of_hearts)

# Print each probability

print()
print("~~~~Probability of drawing cards~~~~")
print()
print('Chance of drawing heart is :' + str(heart_probability) + '%')
print('Chance of drawing face-card is :'+ str(face_card_probability) + '%')
print('Chance of drawing queen-of-hearts is :'+ str(queen_of_hearts_probability) + '%')

Output:

G:\gfgctd\5thBCA(NEP)\DataAnalytics\DAdataset\2.practDA1\DA-Python-
Rashmi\Simple_probability.py
~~~~Number of each card~~~~

In the deck of 52 cards, number of hearts present is: 13

In the deck of 52 cards, number of face-cards present is: 12
In the deck of 52 cards, number of queen_of_hearts present is: 1

Probability of drawing cards

Chance of drawing heart is :25.0%

Chance of drawing face-card is :23.1%
Chance of drawing queen-of-hearts is :1.9%

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.

Data Analytics using Python NEP V Semester BCA

2. Write a python program for real- life application of probability distribution. Consider Exam
scores by a class with mean=70 and standard deviation=10.

import numpy as np
import [Link] as plt

# Generate exam scores with mean=70 and standard deviation=10

exam_scores = [Link](70, 10, size=20)
print(exam_scores)
# Calculate the percentage of students scoring above 80
students_above_80 = [Link](exam_scores > 80) / len(exam_scores) * 100

# Calculate the percentage of students scoring above 70

students_above_70 = [Link](exam_scores > 70) / len(exam_scores) * 100

# Print the percentage

print("Percentage of students scoring above 80:", students_above_80, "%")
print("Percentage of students scoring above 70:", students_above_70, "%")

# Create a bar chart

[Link](range(len(exam_scores)), exam_scores) # x-axis is index, y-axis is score

# Customize the chart

[Link]("Student")
[Link]("Exam Score")
[Link]("Distribution of Exam Scores")

# Show the chart

[Link]()

Output:

G:\gfgctd\5thBCA(NEP)\DataAnalytics\DAdataset\2.practDA1\DA-Python-
Rashmi\probability_distribution.py
[68.86172681 73.59135747 60.11343059 74.89978041 66.02314038 48.28600854
70.87214828 70.98672986 81.98453173 72.42196246 55.78512137 61.13897005
93.96274937 80.12409655 73.62671301 70.35841052 80.06695564 78.38647344
63.38810889 73.5667196 ]
Percentage of students scoring above 80: 20.0 %
Percentage of students scoring above 70: 65.0 %

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.

Data Analytics using Python NEP V Semester BCA

3. Write a python program to calculate correlation co-efficient and draw scattered plot.

import [Link] as plt

import math as mt

# Define the dataset (you can replace this with your own data)
data_x = [18,17,26,19,27,31,14,29,32,26]#Experince in months
data_y = [16000,11000,23000,23000,23000,32000,15000,33000,32000,32000]#Salary

#dataset to get correlation=1

#data_x = [1,2,3,4,5,6,7,8,9,10]
#data_y = [1,2,3,4,5,6,7,8,9,10]

# Calculate the means of x and y

mean_x = sum(data_x) / len(data_x)
mean_y = sum(data_y) / len(data_y)

# Calculate the numerator and denominators

numerator = sum((data_x[i] - mean_x) * (data_y[i] - mean_y) for i in range(len(data_x)))
denominator_x = sum((data_x[i] - mean_x)**2 for i in range(len(data_x)))
denominator_y = sum((data_y[i] - mean_y)**2 for i in range(len(data_y)))

# Calculate the correlation coefficient 'r'

#r = (len(data_x)*numerator) / ( ((denominator_x ** 0.5)*[Link](len(data_x))) * ((denominator_y
** 0.5) * [Link](len(data_x)) ))

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.

Data Analytics using Python NEP V Semester BCA

#or simply you can use

r = numerator / ( (denominator_x ** 0.5) * (denominator_y ** 0.5) )

# Print the correlation coefficient

print("Correlation Coefficient (r):", r)
if r>0:
print("The obtained correlation is positive correlation")
elif r<0:
print("The obtained correlation is negative correlation")
else:
printf("The obtained correlation is zero correlation")

# Create a scatter plot of the data

[Link](data_x, data_y, label="Experience Salary Dataset")
[Link]("Experience (in Months)")
[Link]("Salary per month")

# Add a line representing the correlation coefficient

x_line = [min(data_x), max(data_x)]
y_line = [min(data_y), max(data_y)]
[Link](x_line, y_line, color='red', label="Correlation Coefficient Line")

# Display the legend

[Link]()

# Show the plot

[Link]("Scatter Plot with Correlation Coefficient Line")
[Link](True)
[Link]()

Output:

G:\gfgctd\5thBCA(NEP)\DataAnalytics\DAdataset\2.practDA1\DA-Python-Rashmi\[Link]
Correlation Coefficient (r): 0.8836309503669456
The obtained correlation is positive correlation

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.

Data Analytics using Python NEP V Semester BCA

4. Write Python program to calculate linear regression.

import numpy as np
import [Link] as plt

X = [Link]([18,17,26,19,27,31,14,29,32,26])#Experince in months
Y = [Link]([16000,11000,23000,23000,23000,32000,15000,33000,32000,32000])#Salary

mean_x = [Link](X)
mean_y = [Link](Y)
variance_x = [Link](X)
covariance = ([Link]((X - mean_x) * (Y - mean_y)))/(len(X))
a = covariance / variance_x
b = mean_y - a * mean_x
Y_pred = a * X + b
print(f"Regression Line: Y = {a:.2f} + {b:.2f}X")
print("Y- values are = " , Y_pred )
print("For corresponding X- values=" , X)
# Plotting the Data
[Link](X, Y, label="Original Data")
[Link](X, Y_pred, color="red", label=f"Regression Line: Y = {a:.2f} + {b:.2f}X")

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.

Data Analytics using Python NEP V Semester BCA

[Link]("Experince")
[Link]("Salary")
[Link]()
[Link](True)
[Link]()

# Getting the Solution for new data set

new_X = 7.5
new_Y = a * new_X + b
print()
print(f"Predicted Y-value = {a:.2f} + {b:.2f}X for X= {new_X} ")
print(f"Predicted Y-value = {new_Y:.2f} ")

Output:

G:/gfgctd/5thBCA(NEP)/DataAnalytics/DAdataset/2.practDA1/DA-Python-
Rashmi/Linear_Regression.py
Regression Line: Y = 1123.60 + -2853.93X
Y- values are = [17370.78651685 16247.19101124 26359.5505618 18494.38202247
27483.14606742 31977.52808989 12876.40449438 29730.33707865
33101.12359551 26359.5505618 ]
For corresponding X- values= [18 17 26 19 27 31 14 29 32 26]

Predicted Y-value = 1123.60 + -2853.93X for X= 7.5

Predicted Y-value = 5573.03

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.

Data Analytics using Python NEP V Semester BCA

5. Write a python program to perform Logistic Regression. Note: use dataset from
“[Link]” file to show logistic regression.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from [Link] import accuracy_score, classification_report, confusion_matrix

# Load the dataset from the CSV file

# Replace 'your_dataset.csv' with the actual file path or URL of your CSV file
file_path = '[Link]'
data = pd.read_csv(file_path)

# Display the first few rows of the dataset

print("Dataset:")
print([Link]())

# Assume the last column is the target variable (diabetes or not)

X = [Link][:, :-1] # Features (all columns except the last one)
y = [Link][:, -1] # Target variable (last column)

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.

Data Analytics using Python NEP V Semester BCA

# Create a logistic regression model

model = LogisticRegression()

# Train the model on the training set

[Link](X_train, y_train)

# Make predictions on the test set

y_pred = [Link](X_test)

# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

# Display the results

print("\nModel Evaluation:")
print(f"Accuracy: {accuracy:.2f}")
print("\nConfusion Matrix:")
print(conf_matrix)
print("\nClassification Report:")
print(class_report)

Output:

G:/gfgctd/5thBCA(NEP)/DataAnalytics/DAdataset/2.practDA1/DA-Python-
Rashmi/[Link]
Dataset:
Pregnancies Glucose BloodPressure ... DiabetesPedigreeFunction Age Outcome
0 6 148 72 ... 0.627 50 1
1 1 85 66 ... 0.351 31 0
2 8 183 64 ... 0.672 32 1
3 1 89 66 ... 0.167 21 0
4 0 137 40 ... 2.288 33 1

[5 rows x 9 columns]

Model Evaluation:
Accuracy: 0.75

Confusion Matrix:
[[78 21]
[18 37]]

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.

Data Analytics using Python NEP V Semester BCA

Classification Report:
precision recall f1-score support

0 0.81 0.79 0.80 99

1 0.64 0.67 0.65 55

accuracy 0.75 154

macro avg 0.73 0.73 0.73 154
weighted avg 0.75 0.75 0.75 154

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.

Ad3411 - Student
No ratings yet
Ad3411 - Student
27 pages
Data Science Laboratory Experiments
No ratings yet
Data Science Laboratory Experiments
26 pages
FDS Record
No ratings yet
FDS Record
63 pages
Unit I
No ratings yet
Unit I
15 pages
AD3411 Data Science Lab Manual
No ratings yet
AD3411 Data Science Lab Manual
27 pages
Pandas DataFrames and Statistical Analysis
No ratings yet
Pandas DataFrames and Statistical Analysis
21 pages
Data Preprocessing and Visualization
No ratings yet
Data Preprocessing and Visualization
27 pages
Machine Learning Course Manual
No ratings yet
Machine Learning Course Manual
36 pages
Python Data Handling with Pandas
No ratings yet
Python Data Handling with Pandas
33 pages
Cy-701 Machine Learning Lab Manual
No ratings yet
Cy-701 Machine Learning Lab Manual
31 pages
Machine Learning Practical Record BCA
No ratings yet
Machine Learning Practical Record BCA
35 pages
Python Data Analysis Techniques
No ratings yet
Python Data Analysis Techniques
19 pages
Python Data Analysis with Pandas and Matplotlib
No ratings yet
Python Data Analysis with Pandas and Matplotlib
30 pages
Data and Visual Analytics Lab Manual
No ratings yet
Data and Visual Analytics Lab Manual
20 pages
AD3411 Data Science Lab Manual
No ratings yet
AD3411 Data Science Lab Manual
24 pages
Data Analytics Lab Manual BME456C
No ratings yet
Data Analytics Lab Manual BME456C
32 pages
Assignment 1: "Autodata - CSV"
No ratings yet
Assignment 1: "Autodata - CSV"
49 pages
Data Analytics Lab Manual 2024-25
No ratings yet
Data Analytics Lab Manual 2024-25
26 pages
Lab Programs (Ad3411 Dsa)
No ratings yet
Lab Programs (Ad3411 Dsa)
22 pages
Pattern Recognition Lab Experiments Guide
No ratings yet
Pattern Recognition Lab Experiments Guide
26 pages
Data Science Lab Manual: Python Experiments
No ratings yet
Data Science Lab Manual: Python Experiments
32 pages
NumPy and Pandas Data Analysis Techniques
No ratings yet
NumPy and Pandas Data Analysis Techniques
14 pages
Kedar Dsbda Codes
No ratings yet
Kedar Dsbda Codes
18 pages
FDSA Lab Manual: Python Operations
No ratings yet
FDSA Lab Manual: Python Operations
31 pages
NumPy, Pandas, and Matplotlib Basics
No ratings yet
NumPy, Pandas, and Matplotlib Basics
30 pages
Data Analysis and Modeling Techniques
No ratings yet
Data Analysis and Modeling Techniques
29 pages
Bhaskar Engineering College: Machine Learning Lab
No ratings yet
Bhaskar Engineering College: Machine Learning Lab
31 pages
Data Science with Python Lab Report
No ratings yet
Data Science with Python Lab Report
19 pages
Machine Learning Lab Manual for B.Tech
No ratings yet
Machine Learning Lab Manual for B.Tech
47 pages
ML Lab File 63
No ratings yet
ML Lab File 63
32 pages
Machine Learning Lab Experiments Guide
No ratings yet
Machine Learning Lab Experiments Guide
18 pages
Python Cheat Sheet For Data Analysis
No ratings yet
Python Cheat Sheet For Data Analysis
2 pages
Cheat Sheet Modeldeploy
No ratings yet
Cheat Sheet Modeldeploy
2 pages
Machine Learning Lab Manual Guide
No ratings yet
Machine Learning Lab Manual Guide
25 pages
Python Data Science Lab Exercises
No ratings yet
Python Data Science Lab Exercises
32 pages
Python Machine Learning Lab Manual
No ratings yet
Python Machine Learning Lab Manual
22 pages
CS3362 Data Science Lab Manual 2022-23
No ratings yet
CS3362 Data Science Lab Manual 2022-23
54 pages
Python AI & ML Practical File
No ratings yet
Python AI & ML Practical File
26 pages
Machine Learning Lab Certificate and Experiments
No ratings yet
Machine Learning Lab Certificate and Experiments
44 pages
BMI Analysis with Random Functions
No ratings yet
BMI Analysis with Random Functions
10 pages
Python Probability and Statistics Analysis
No ratings yet
Python Probability and Statistics Analysis
11 pages
ML Lab Manual
No ratings yet
ML Lab Manual
56 pages
Sourabh Chougule: MCA Practical Report
No ratings yet
Sourabh Chougule: MCA Practical Report
75 pages
Data Analysis and Visualization Techniques
No ratings yet
Data Analysis and Visualization Techniques
16 pages
Data Analytics Lab Manual
No ratings yet
Data Analytics Lab Manual
13 pages
Mllaab
No ratings yet
Mllaab
8 pages
NumPy and Pandas Data Manipulation Guide
No ratings yet
NumPy and Pandas Data Manipulation Guide
11 pages
Pandas DataFrame and Plotting Basics
No ratings yet
Pandas DataFrame and Plotting Basics
17 pages
Data Mining Lab Programs 2023-24
No ratings yet
Data Mining Lab Programs 2023-24
79 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
29 pages
Python Scholarship Eligibility Logic
No ratings yet
Python Scholarship Eligibility Logic
5 pages
Cse Lab Manual Sample
No ratings yet
Cse Lab Manual Sample
18 pages
Machine Learning Lab Manual for B.Tech
No ratings yet
Machine Learning Lab Manual for B.Tech
47 pages
Python Data Analysis with Libraries
No ratings yet
Python Data Analysis with Libraries
27 pages
Data Analysis with Python Libraries
No ratings yet
Data Analysis with Python Libraries
18 pages
AD3411 Data Science Lab Manual
No ratings yet
AD3411 Data Science Lab Manual
34 pages
Python Data Analysis with NumPy & Pandas
No ratings yet
Python Data Analysis with NumPy & Pandas
36 pages
Applications of Quadratic Equations
No ratings yet
Applications of Quadratic Equations
3 pages
Resource Management With Deep Reinforcement Learning
No ratings yet
Resource Management With Deep Reinforcement Learning
7 pages
Facilities Planning for Banks and Universities
No ratings yet
Facilities Planning for Banks and Universities
6 pages
Navigating PR Crisis on Twitter
No ratings yet
Navigating PR Crisis on Twitter
4 pages
Mesmer's Animal Magnetism Controversy
No ratings yet
Mesmer's Animal Magnetism Controversy
20 pages
Constructing a Frequency Distribution Table
No ratings yet
Constructing a Frequency Distribution Table
2 pages
Facebook Insights Analysis for Bright Ink
No ratings yet
Facebook Insights Analysis for Bright Ink
4 pages
College Reading and Study Skills: Kathleen T Mcwhorter
No ratings yet
College Reading and Study Skills: Kathleen T Mcwhorter
3 pages
Rural Development in Ogbomoso Region
No ratings yet
Rural Development in Ogbomoso Region
8 pages
Formal vs Informal Reports Explained
No ratings yet
Formal vs Informal Reports Explained
3 pages
Science Teaching Styles and Student Performance
No ratings yet
Science Teaching Styles and Student Performance
6 pages
3rd Grade States of Matter Lesson Plan
No ratings yet
3rd Grade States of Matter Lesson Plan
4 pages
MetalWork - Baileigh 2016
No ratings yet
MetalWork - Baileigh 2016
126 pages
Bishop Challoner Anti-Bullying Policy
100% (1)
Bishop Challoner Anti-Bullying Policy
2 pages
Verbal and Non-Verbal Communication Insights
100% (3)
Verbal and Non-Verbal Communication Insights
2 pages
ISO/TS 16949:2009 Audit Checklist
No ratings yet
ISO/TS 16949:2009 Audit Checklist
48 pages
Decision Theory: Key Concepts and Methods
No ratings yet
Decision Theory: Key Concepts and Methods
4 pages
Claimed: Robert Hoxie Frederick Winslow Taylor
No ratings yet
Claimed: Robert Hoxie Frederick Winslow Taylor
4 pages
General Informatics
No ratings yet
General Informatics
16 pages
Uploading Software with LOGMAN
No ratings yet
Uploading Software with LOGMAN
7 pages
GIS Map Design and Color Techniques
No ratings yet
GIS Map Design and Color Techniques
66 pages
MA24103 Maths II Tutorial 5 SP26
No ratings yet
MA24103 Maths II Tutorial 5 SP26
5 pages
ABAP 7.4 Operators and Syntax Guide
100% (2)
ABAP 7.4 Operators and Syntax Guide
20 pages
Bubble and Dew Point Calculations
No ratings yet
Bubble and Dew Point Calculations
15 pages
Senior High Research Proposal Defense Guide
No ratings yet
Senior High Research Proposal Defense Guide
3 pages
HCW Nigeria
No ratings yet
HCW Nigeria
12 pages
Listening Test 2 Fixed
No ratings yet
Listening Test 2 Fixed
4 pages
Robust Cost Functions for Camera Outliers
No ratings yet
Robust Cost Functions for Camera Outliers
8 pages
Mastering Body Language in Presentations
No ratings yet
Mastering Body Language in Presentations
12 pages
Fall 2025 ME 4015 Senior Design Capstone
No ratings yet
Fall 2025 ME 4015 Senior Design Capstone
4 pages

Experience-Salary Correlation Analysis

Uploaded by

Experience-Salary Correlation Analysis

Uploaded by

Data Analytics using Python NEP V Semester BCA

1. Calculate simple probability using python.

# Determine the probability of drawing a face card

# Determine the probability of drawing the queen of hearts

#number of each card type

# Print each probability

In the deck of 52 cards, number of hearts present is: 13

~~~~Probability of drawing cards~~~~

Chance of drawing heart is :25.0%

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.

# Generate exam scores with mean=70 and standard deviation=10

# Calculate the percentage of students scoring above 70

# Print the percentage

# Create a bar chart

# Customize the chart

# Show the chart

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.

import [Link] as plt

#dataset to get correlation=1

# Calculate the means of x and y

# Calculate the numerator and denominators

# Calculate the correlation coefficient 'r'

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.

#or simply you can use

# Print the correlation coefficient

# Create a scatter plot of the data

# Add a line representing the correlation coefficient

# Display the legend

# Show the plot

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.

4. Write Python program to calculate linear regression.

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.

# Getting the Solution for new data set

Predicted Y-value = 1123.60 + -2853.93X for X= 7.5

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.

# Load the dataset from the CSV file

# Display the first few rows of the dataset

# Assume the last column is the target variable (diabetes or not)

# Split the dataset into training and testing sets

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.

# Create a logistic regression model

# Train the model on the training set

# Make predictions on the test set

# Evaluate the model

# Display the results

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.

0 0.81 0.79 0.80 99

accuracy 0.75 154

Dr. RASHMI M, Dept. of Computer Science, GFGCTD.

You might also like

Probability of drawing cards