Data Analytics Experiment-7
Dataset-Original
Source Code:
import pandas as pd
import [Link] as plt
import seaborn as sns
# Load the Excel dataset
df_healthcare = pd.read_excel("healthcare_data.xlsx")
# Get numeric summary for Age
numeric_stats = df_healthcare[['Age']].describe()
# Convert specific rows to integer
for row in ['count', 'min', 'max']:
numeric_stats.loc[row, 'Age'] = int(numeric_stats.loc[row, 'Age'])
# Format the output manually
print("Summary Statistics (only numeric columns like Age):")
for row in numeric_stats.index:
value = numeric_stats.loc[row, 'Age']
if row in ['count', 'min', 'max']:
print(f"{row:<6} {int(value)}")
else:
print(f"{row:<6} {value:.6f}")
# Gender Distribution
gender_dist = df_healthcare['Gender'].value_counts()
print("\nGender Distribution:")
print(gender_dist)
# Diagnosis Distribution
diagnosis_dist = df_healthcare['Diagnosis'].value_counts()
print("\nDiagnosis Distribution:")
print(diagnosis_dist)
# Treatment Distribution
treatment_dist = df_healthcare['Treatment'].value_counts()
print("\nTreatment Distribution:")
print(treatment_dist)
# Age Distribution Plot
[Link](figsize=(10, 6))
[Link](df_healthcare['Age'], bins=20, kde=False, cumulative=True, color='orange')
[Link]('Cumulative Age Distribution')
[Link]('Age')
[Link]('Cumulative Frequency')
[Link]()
# Gender Pie Chart
[Link](figsize=(8, 8))
gender_dist.[Link](autopct='%1.1f%%', colors=['lightcoral', 'lightgreen'], startangle=90)
[Link]('Gender Distribution')
[Link]('')
[Link]()
# Diagnosis Bar Chart
[Link](figsize=(10, 6))
[Link](data=df_healthcare, x="Diagnosis", palette="Set2")
[Link]('Diagnosis Distribution')
[Link]('Diagnosis')
[Link]('Count')
[Link](rotation=45)
[Link]()
# Treatment Bar Chart
[Link](figsize=(10, 6))
[Link](data=df_healthcare, x="Treatment", palette="Set1")
[Link]('Treatment Distribution')
[Link]('Treatment Type')
[Link]('Count')
[Link](rotation=45)
[Link]()
# Age vs Diagnosis Boxplot
[Link](figsize=(10, 6))
[Link](x="Diagnosis", y="Age", data=df_healthcare, palette="Set3")
[Link]('Age Distribution by Diagnosis')
[Link]('Diagnosis')
[Link]('Age')
[Link]()
OutPut:
Summary Statistics (only numeric columns like Age):
count 100
mean 50.330000
std 17.533837
min 20
25% 34.750000
50% 49.000000
75% 65.250000
max 85
Gender Distribution:
Gender
Male 57
Female 43
Name: count, dtype: int64
Diagnosis Distribution:
Diagnosis
Flu 25
Diabetes 24
Hypertension 18
Cancer 17
Asthma 16
Name: count, dtype: int64
Treatment Distribution:
Treatment
Therapy 26
Surgery 24
Medication 22
Chemotherapy 14
Observation 14
Name: count, dtype: int64