0% found this document useful (0 votes)
13 views16 pages

Student Performance Data Analysis

Uploaded by

Ahaan Raza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views16 pages

Student Performance Data Analysis

Uploaded by

Ahaan Raza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

1.

Import Data and Required Packages

Importing Pandas, Numpy, Matplotlib, Seaborn and Warings Library.

In [1]: import numpy as np


import pandas as pd
import seaborn as sns
import [Link] as plt
%matplotlib inline
import warnings
[Link]('ignore')

Import the CSV Data as Pandas DataFrame

In [2]: df = pd.read_csv("[Link]")

Show Top 5 Records

In [3]: [Link]()

Out[3]: parental test


math reading writing
gender race/ethnicity level of lunch preparation
score score score
education course

bachelor's
0 female group B standard none 72 72 74
degree

some
1 female group C standard completed 69 90 88
college

master's
2 female group B standard none 90 95 93
degree

associate's
3 male group A free/reduced none 47 57 44
degree

some
4 male group C standard none 76 78 75
college

Shape of the dataset

In [4]: [Link]

(1000, 8)
Out[4]:

1. Dataset information

gender : sex of students -> (Male/female)

race/ethnicity : ethnicity of students -> (Group A, B,C, D,E)

parental level of education : parents' final education ->(bachelor's

degree,some college,master's degree,associate's degree,high school)

lunch : having lunch before test (standard or free/reduced)


test preparation course : complete or not complete before test

math score

reading score

writing score

1. Data Checks to perform

Check Missing values

Check Duplicates

Check data type

Check the number of unique values of each column

Check statistics of data set

Check various categories present in the different categorical column

3.1 Check Missing values

In [5]: [Link]().sum()

gender 0
Out[5]:
race/ethnicity 0
parental level of education 0
lunch 0
test preparation course 0
math score 0
reading score 0
writing score 0
dtype: int64

There are no missing values in the data set

3.2 Check Duplicates

In [6]: [Link]().sum()

0
Out[6]:

There are no duplicates values in the data set

3.3 Check data types

In [7]: [Link]()
<class '[Link]'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 gender 1000 non-null object
1 race/ethnicity 1000 non-null object
2 parental level of education 1000 non-null object
3 lunch 1000 non-null object
4 test preparation course 1000 non-null object
5 math score 1000 non-null int64
6 reading score 1000 non-null int64
7 writing score 1000 non-null int64
dtypes: int64(3), object(5)
memory usage: 62.6+ KB

3.4 Checking the number of unique values of each column

In [8]: [Link]()

gender 2
Out[8]:
race/ethnicity 5
parental level of education 6
lunch 2
test preparation course 2
math score 81
reading score 72
writing score 77
dtype: int64

3.5 Check statistics of data set

In [9]: [Link]()

Out[9]: math score reading score writing score

count 1000.00000 1000.000000 1000.000000

mean 66.08900 69.169000 68.054000

std 15.16308 14.600192 15.195657

min 0.00000 17.000000 10.000000

25% 57.00000 59.000000 57.750000

50% 66.00000 70.000000 69.000000

75% 77.00000 79.000000 79.000000

max 100.00000 100.000000 100.000000

Insight

From above description of numerical data, all means are very close to each other - between
66 and 68.05;

All standard deviations are also close - between 14.6 and 15.19;

While there is a minimum score 0 for math, for writing minimum is much higher = 10 and
for reading myet higher = 17

3.6 Exploring Data


In [10]: [Link]()

Out[10]: parental test


math reading writing
gender race/ethnicity level of lunch preparation
score score score
education course

bachelor's
0 female group B standard none 72 72 74
degree

some
1 female group C standard completed 69 90 88
college

master's
2 female group B standard none 90 95 93
degree

associate's
3 male group A free/reduced none 47 57 44
degree

some
4 male group C standard none 76 78 75
college

In [11]: print("Categories in 'gender' variable: ",end=" " )


print(df['gender'].unique())

print("Categories in 'race_ethnicity' variable: ",end=" ")


print(df['race/ethnicity'].unique())

print("Categories in'parental level of education' variable:",end=" " )


print(df['parental level of education'].unique())

print("Categories in 'lunch' variable: ",end=" " )


print(df['lunch'].unique())

print("Categories in 'test preparation course' variable: ",end=" " )


print(df['test preparation course'].unique())

Categories in 'gender' variable: ['female' 'male']


Categories in 'race_ethnicity' variable: ['group B' 'group C' 'group A' 'group
D' 'group E']
Categories in'parental level of education' variable: ["bachelor's degree" 'some co
llege' "master's degree" "associate's degree"
'high school' 'some high school']
Categories in 'lunch' variable: ['standard' 'free/reduced']
Categories in 'test preparation course' variable: ['none' 'completed']

In [12]: # define numerical & categorical columns


numeric_features = [feature for feature in [Link] if df[feature].dtype != 'O']
categorical_features = [feature for feature in [Link] if df[feature].dtype ==

# print columns
print('We have {} numerical features : {}'.format(len(numeric_features), numeric_fe
print('\nWe have {} categorical features : {}'.format(len(categorical_features), ca

We have 3 numerical features : ['math score', 'reading score', 'writing score']

We have 5 categorical features : ['gender', 'race/ethnicity', 'parental level of e


ducation', 'lunch', 'test preparation course']

3.8 Adding columns for "Total Score" and "Average"

In [13]: df['total score'] = df['math score'] + df['reading score'] + df['writing score']


df['average'] = df['total score']/3
[Link]()
Out[13]: parental test
math reading writing total
gender race/ethnicity level of lunch preparation a
score score score score
education course

bachelor's
0 female group B standard none 72 72 74 218 72
degree

some
1 female group C standard completed 69 90 88 247 82
college

master's
2 female group B standard none 90 95 93 278 92
degree

associate's
3 male group A free/reduced none 47 57 44 148 49
degree

some
4 male group C standard none 76 78 75 229 76
college

 

In [14]: reading_full = df[df['reading score'] == 100]['average'].count()


writing_full = df[df['writing score'] == 100]['average'].count()
math_full = df[df['math score'] == 100]['average'].count()

print(f'Number of students with full marks in Maths: {math_full}')


print(f'Number of students with full marks in Writing: {writing_full}')
print(f'Number of students with full marks in Reading: {reading_full}')

Number of students with full marks in Maths: 7


Number of students with full marks in Writing: 14
Number of students with full marks in Reading: 17

In [15]: reading_less_20 = df[df['reading score'] <= 20]['average'].count()


writing_less_20 = df[df['writing score'] <= 20]['average'].count()
math_less_20 = df[df['math score'] <= 20]['average'].count()

print(f'Number of students with less than 20 marks in Maths: {math_less_20}')


print(f'Number of students with less than 20 marks in Writing: {writing_less_20}')
print(f'Number of students with less than 20 marks in Reading: {reading_less_20}')

Number of students with less than 20 marks in Maths: 4


Number of students with less than 20 marks in Writing: 3
Number of students with less than 20 marks in Reading: 1

Insights

.From above values we get students have performed the worst in Maths

.Best performance is in reading section

1. Exploring Data ( Visualization )

4.1 Visualize average score distribution to make some conclusion.

.Histogram

.Kernel Distribution Function (KDE)

4.1.1 Histogram & KDE

In [16]: fig, axs = [Link](1, 2, figsize=(15, 7))


[Link](121)
[Link](data=df,x='average',bins=30,kde=True,color='g')
[Link](122)
[Link](data=df,x='average',kde=True,hue='gender')
[Link]()

In [17]: fig, axs = [Link](1, 2, figsize=(15, 7))


[Link](121)
[Link](data=df,x='total score',bins=30,kde=True,color='g')
[Link](122)
[Link](data=df,x='total score',kde=True,hue='gender')
[Link]()

Female students tend to perform well then male students.

In [18]: [Link](1,3,figsize=(25,6))
[Link](141)
[Link](data=df,x='average',kde=True,hue='lunch')
[Link](142)
[Link](data=df[[Link]=='female'],x='average',kde=True,hue='lunch')
[Link](143)
[Link](data=df[[Link]=='male'],x='average',kde=True,hue='lunch')
[Link]()
Insights

Standard lunch helps perform well in exams.

Standard lunch helps perform well in exams be it a male or a female.

In [19]: [Link](1,3,figsize=(25,6))
[Link](141)
ax =[Link](data=df,x='average',kde=True,hue='parental level of education')
[Link](142)
ax =[Link](data=df[[Link]=='male'],x='average',kde=True,hue='parental leve
[Link](143)
ax =[Link](data=df[[Link]=='female'],x='average',kde=True,hue='parental le
[Link]()

Insights

In general parent's education don't help student perform well in exam.

2nd plot shows that parent's whose education is of associate's degree or master's degree
their male child tend to perform well in exam

3rd plot we can see there is no effect of parent's education on female students.

In [20]: [Link](1,3,figsize=(25,6))
[Link](141)
ax =[Link](data=df,x='average',kde=True,hue='race/ethnicity')
[Link](142)
ax =[Link](data=df[[Link]=='female'],x='average',kde=True,hue='race/ethnic
[Link](143)
ax =[Link](data=df[[Link]=='male'],x='average',kde=True,hue='race/ethnicit
[Link]()
Insights

Students of group A and group B tends to perform poorly in exam.

Students of group A and group B tends to perform poorly in exam irrespective of whether
they are male or female

In [21]: [Link](figsize=(18,8))
[Link](1, 4, 1)
[Link]('MATH SCORES')
[Link](y='math score',data=df,color='red',linewidth=3)
[Link](1, 4, 2)
[Link]('READING SCORES')
[Link](y='reading score',data=df,color='green',linewidth=3)
[Link](1, 4, 3)
[Link]('WRITING SCORES')
[Link](y='writing score',data=df,color='blue',linewidth=3)
[Link]()

Insights

From the above three plots its clearly visible that most of the students score in between 60-
80 in Maths whereas in reading and writing most of them score from 50-80

4.3 Multivariate analysis using pieplot


In [22]: [Link]['[Link]'] = (30, 12)

[Link](1, 5, 1)
size = df['gender'].value_counts()
labels = 'Female', 'Male'
color = ['red','green']

[Link](size, colors = color, labels = labels,autopct = '.%2f%%')


[Link]('Gender', fontsize = 20)
[Link]('off')

[Link](1, 5, 2)
size = df['race/ethnicity'].value_counts()
labels = 'Group C', 'Group D','Group B','Group E','Group A'
color = ['red', 'green', 'blue', 'cyan','orange']

[Link](size, colors = color,labels = labels,autopct = '.%2f%%')


[Link]('Race/Ethnicity', fontsize = 20)
[Link]('off')

[Link](1, 5, 3)
size = df['lunch'].value_counts()
labels = 'Standard', 'Free'
color = ['red','green']

[Link](size, colors = color,labels = labels,autopct = '.%2f%%')


[Link]('Lunch', fontsize = 20)
[Link]('off')

[Link](1, 5, 4)
size = df['test preparation course'].value_counts()
labels = 'None', 'Completed'
color = ['red','green']

[Link](size, colors = color,labels = labels,autopct = '.%2f%%')


[Link]('Test Course', fontsize = 20)
[Link]('off')

[Link](1, 5, 5)
size = df['parental level of education'].value_counts()
labels = 'Some College', "Associate's Degree",'High School','Some High School',"Bac
color = ['red', 'green', 'blue', 'cyan','orange','grey']

[Link](size, colors = color,labels = labels,autopct = '.%2f%%')


[Link]('Parental Education', fontsize = 20)
[Link]('off')

plt.tight_layout()
[Link]()

[Link]()
Insights

Number of Male and Female students is almost equal

Number students are greatest in Group C

Number of students who have standard lunch are greater

Number of students who have not enrolled in any test preparation course is greater

Number of students whose parental education is "Some College" is greater followed closely
by "Associate's Degree"

4.4 Feature Wise Visualization

4.4.1 GENDER COLUMN

How is distribution of Gender ?

Is gender has any impact on student's performance ?

UNIVARIATE ANALYSIS ( How is distribution of Gender ? )

In [23]: f,ax=[Link](1,2,figsize=(20,10))
[Link](x=df['gender'],data=df,palette ='bright',ax=ax[0],saturation=0.95)
for container in ax[0].containers:
ax[0].bar_label(container,color='black',size=20)

[Link](x=df['gender'].value_counts(),labels=['Male','Female'],explode=[0,0.1],auto
[Link]()

Insights
Gender has balanced data with female students are 518 (48%) and male students are 482
(52%)

4.4.2 RACE/EHNICITY COLUMN

How is Group wise distribution ?

Is Race/Ehnicity has any impact on student's performance ?

UNIVARIATE ANALYSIS ( How is Group wise distribution ?)

In [24]: f,ax=[Link](1,2,figsize=(20,10))
[Link](x=df['race/ethnicity'],data=df,palette = 'bright',ax=ax[0],saturation
for container in ax[0].containers:
ax[0].bar_label(container,color='black',size=20)

[Link](x = df['race/ethnicity'].value_counts(),labels=df['race/ethnicity'].value_c
[Link]()

Insights

Most of the student belonging from group C /group D.

Lowest number of students belong to groupA.

BIVARIATE ANALYSIS ( Is Race/Ehnicity has any impact on student's performance ? )

In [25]: Group_data2=[Link]('race/ethnicity')
f,ax=[Link](1,3,figsize=(20,8))
[Link](x=Group_data2['math score'].mean().index,y=Group_data2['math score'].me
ax[0].set_title('Math score',color='#005ce6',size=20)

for container in ax[0].containers:


ax[0].bar_label(container,color='black',size=15)

[Link](x=Group_data2['reading score'].mean().index,y=Group_data2['reading scor


ax[1].set_title('Reading score',color='#005ce6',size=20)

for container in ax[1].containers:


ax[1].bar_label(container,color='black',size=15)
[Link](x=Group_data2['writing score'].mean().index,y=Group_data2['writing scor
ax[2].set_title('Writing score',color='#005ce6',size=20)

for container in ax[2].containers:


ax[2].bar_label(container,color='black',size=15)

Insights

Group E students have scored the highest marks.

Group A students have scored the lowest marks.

Students from a lower Socioeconomic status have a lower avg in all course subjects

4.4.3 PARENTAL LEVEL OF EDUCATION COLUMN

What is educational background of student's parent ?

Is parental education has any impact on student's performance ?

UNIVARIATE ANALYSIS ( What is educational background of student's parent ? )

In [26]: [Link]['[Link]'] = (15, 9)


[Link]('fivethirtyeight')
[Link](df["parental level of education"], palette = 'Blues')
[Link]('Comparison of Parental Education', fontweight = 30, fontsize = 20)
[Link]('Degree')
[Link]('count')
[Link]()
Insights

Largest number of parents are from some college.

4.4.4 LUNCH COLUMN

Which type of lunch is most common amoung students ?

What is the effect of lunch type on test results?

BIVARIATE ANALYSIS ( Is lunch type intake has any impact on student's performance ? )

In [27]: f,ax=[Link](1,2,figsize=(20,8))
[Link](x=df['parental level of education'],data=df,palette = 'bright',hue='t
ax[0].set_title('Students vs test preparation course ',color='black',size=25)
for container in ax[0].containers:
ax[0].bar_label(container,color='black',size=20)

[Link](x=df['parental level of education'],data=df,palette = 'bright',hue='l


for container in ax[1].containers:
ax[1].bar_label(container,color='black',size=20)

Insights
Students who get Standard Lunch tend to perform better than students who got
free/reduced lunch

4.4.5 TEST PREPARATION COURSE COLUMN

Which type of lunch is most common amoung students ?

Is Test prepration course has any impact on student's performance ?

BIVARIATE ANALYSIS ( Is Test prepration course has any impact on student's performance ? )

In [28]: [Link](figsize=(12,6))
[Link](2,2,1)
[Link] (x=df['lunch'], y=df['math score'], hue=df['test preparation course'])
[Link](2,2,2)
[Link] (x=df['lunch'], y=df['reading score'], hue=df['test preparation course
[Link](2,2,3)
[Link] (x=df['lunch'], y=df['writing score'], hue=df['test preparation course

<Axes: xlabel='lunch', ylabel='writing score'>


Out[28]:

Insights

Students who have completed the Test Prepration Course have scores higher in all three
categories than those who haven't taken the course

4.4.6 CHECKING OUTLIERS

In [29]: [Link](1,4,figsize=(16,5))
[Link](141)
[Link](df['math score'],color='skyblue')
[Link](142)
[Link](df['reading score'],color='hotpink')
[Link](143)
[Link](df['writing score'],color='yellow')
[Link](144)
[Link](df['average'],color='lightgreen')
[Link]()
4.4.7 MUTIVARIATE ANALYSIS USING PAIRPLOT

In [30]: [Link](df,hue = 'gender')


[Link]()

Insights

From the above plot it is clear that all the scores increase linearly with each other.

1. Conclusions

Student's Performance is related with lunch, race, parental level education

Females lead in pass percentage and also are top-scorers

Student's Performance is not much related with test preparation course


Finishing preparation course is benefitial.

In [ ]:

Common questions

Powered by AI

Test preparation status demonstrates a clear relation with score distribution in which students who completed the preparation course scored higher on average. The scores are more clustered toward higher marks, indicating a narrower performance gap among those who prepared .

Group C and D students have the largest representation and generally perform better, while Group A students exhibit the poorest performance levels across all subjects. The disparities are highlighted in mean scores, where students from Group E score notably higher. Analysis revealed that students from socioeconomically disadvantaged groups tend to perform worse, indicating that race/ethnicity correlates with socioeconomic status and performance outcomes .

The performance distribution indicates that female students tend to score higher overall compared to male students. This trend is evident in the higher average and total scores among female students, suggesting that gender may play a role in academic performance. Visualization through various plots supports the notion that female students are often among the top scorers in exams .

Visualization plots show that students with standard lunches tend to score higher averages compared to those with free/reduced lunch. Histograms and KDE plots offer evidence that lunch type is linked to test performance, possibly highlighting the effects of nutrition on cognitive performance .

The dataset has been verified to contain no missing values or duplicate entries, which substantiates its reliability for analysis. This completeness ensures that findings based on this data are not distorted by absent or redundant information, enabling more accurate interpretations of student performance .

Trends indicate reading scores are typically higher than math scores, while writing scores fall in between. This suggests educators may need to focus on improving mathematical instruction quality. The consistent performance across reading and writing highlights effective pedagogical practices in those areas. Recommendations might emphasize targeted interventions in math, given its pronounced scoring discrepancies .

The data reveal that students who consume a standard lunch perform better than those who receive free or reduced lunch. This is consistently observed across different genders and is associated with higher average scores, indicating a potential link between nutrition and test performance .

Students who completed the test preparation course generally achieved higher scores across all subjects (math, reading, writing) compared to those who did not complete the course. This suggests that completion of a test preparation course can have a beneficial impact on student performance .

Math scores show a wide range of performance, with some students scoring as low as zero and others achieving full marks. The variation in math scores significantly impacts the overall average, as seen in the data where math exhibited the worst relative performance among the subjects, directly influencing total score averages .

The analysis indicates that in general, parental education does not significantly impact the performance of students overall; however, there are nuances. Male students whose parents have an associate's or master's degree tend to perform better, whereas for female students, parental education doesn't appear to exert an influence on performance. Most parents have some college education, which is followed closely by those with an associate's degree .

You might also like