MACHINE LEARNING BASED STUDENT
PERFORMANCE PREDICTION SYSTEM
Project Report submitted for the award of Master of Computer Applications (MCA)
University of Mysore – 2025
ABSTRACT
This project presents a Machine Learning based system developed to predict student
academic performance using historical educational data. The system applies Decision
Tree and Random Forest classification algorithms to identify students who are
academically at risk. The study demonstrates that Random Forest achieves higher
predictive accuracy compared to Decision Tree. The model evaluation is performed using
accuracy metrics and confusion matrix analysis. The results support the effective use of
machine learning techniques in educational analytics.
CHAPTER I - INTRODUCTION
Educational institutions generate large volumes of structured data including attendance,
internal marks, and final grades. Predictive analytics enables proactive academic
intervention by identifying students who may underperform. Machine learning models
transform raw academic data into meaningful predictions through systematic
preprocessing, model training, and evaluation processes.
Educational institutions generate large volumes of structured data including attendance,
internal marks, and final grades. Predictive analytics enables proactive academic
intervention by identifying students who may underperform. Machine learning models
transform raw academic data into meaningful predictions through systematic
preprocessing, model training, and evaluation processes.
Educational institutions generate large volumes of structured data including attendance,
internal marks, and final grades. Predictive analytics enables proactive academic
intervention by identifying students who may underperform. Machine learning models
transform raw academic data into meaningful predictions through systematic
preprocessing, model training, and evaluation processes.
Educational institutions generate large volumes of structured data including attendance,
internal marks, and final grades. Predictive analytics enables proactive academic
intervention by identifying students who may underperform. Machine learning models
transform raw academic data into meaningful predictions through systematic
preprocessing, model training, and evaluation processes.
Educational institutions generate large volumes of structured data including attendance,
internal marks, and final grades. Predictive analytics enables proactive academic
intervention by identifying students who may underperform. Machine learning models
transform raw academic data into meaningful predictions through systematic
preprocessing, model training, and evaluation processes.
Educational institutions generate large volumes of structured data including attendance,
internal marks, and final grades. Predictive analytics enables proactive academic
intervention by identifying students who may underperform. Machine learning models
transform raw academic data into meaningful predictions through systematic
preprocessing, model training, and evaluation processes.
Educational institutions generate large volumes of structured data including attendance,
internal marks, and final grades. Predictive analytics enables proactive academic
intervention by identifying students who may underperform. Machine learning models
transform raw academic data into meaningful predictions through systematic
preprocessing, model training, and evaluation processes.
Educational institutions generate large volumes of structured data including attendance,
internal marks, and final grades. Predictive analytics enables proactive academic
intervention by identifying students who may underperform. Machine learning models
transform raw academic data into meaningful predictions through systematic
preprocessing, model training, and evaluation processes.
Educational institutions generate large volumes of structured data including attendance,
internal marks, and final grades. Predictive analytics enables proactive academic
intervention by identifying students who may underperform. Machine learning models
transform raw academic data into meaningful predictions through systematic
preprocessing, model training, and evaluation processes.
Educational institutions generate large volumes of structured data including attendance,
internal marks, and final grades. Predictive analytics enables proactive academic
intervention by identifying students who may underperform. Machine learning models
transform raw academic data into meaningful predictions through systematic
preprocessing, model training, and evaluation processes.
Educational institutions generate large volumes of structured data including attendance,
internal marks, and final grades. Predictive analytics enables proactive academic
intervention by identifying students who may underperform. Machine learning models
transform raw academic data into meaningful predictions through systematic
preprocessing, model training, and evaluation processes.
Educational institutions generate large volumes of structured data including attendance,
internal marks, and final grades. Predictive analytics enables proactive academic
intervention by identifying students who may underperform. Machine learning models
transform raw academic data into meaningful predictions through systematic
preprocessing, model training, and evaluation processes.
CHAPTER II - LITERATURE REVIEW
Various research studies have explored the use of machine learning techniques such as
Decision Trees, Random Forests, and Support Vector Machines in predicting student
performance. Comparative analysis shows ensemble methods like Random Forest
provide improved stability and generalization capability in academic datasets.
Various research studies have explored the use of machine learning techniques such as
Decision Trees, Random Forests, and Support Vector Machines in predicting student
performance. Comparative analysis shows ensemble methods like Random Forest
provide improved stability and generalization capability in academic datasets.
Various research studies have explored the use of machine learning techniques such as
Decision Trees, Random Forests, and Support Vector Machines in predicting student
performance. Comparative analysis shows ensemble methods like Random Forest
provide improved stability and generalization capability in academic datasets.
Various research studies have explored the use of machine learning techniques such as
Decision Trees, Random Forests, and Support Vector Machines in predicting student
performance. Comparative analysis shows ensemble methods like Random Forest
provide improved stability and generalization capability in academic datasets.
Various research studies have explored the use of machine learning techniques such as
Decision Trees, Random Forests, and Support Vector Machines in predicting student
performance. Comparative analysis shows ensemble methods like Random Forest
provide improved stability and generalization capability in academic datasets.
Various research studies have explored the use of machine learning techniques such as
Decision Trees, Random Forests, and Support Vector Machines in predicting student
performance. Comparative analysis shows ensemble methods like Random Forest
provide improved stability and generalization capability in academic datasets.
Various research studies have explored the use of machine learning techniques such as
Decision Trees, Random Forests, and Support Vector Machines in predicting student
performance. Comparative analysis shows ensemble methods like Random Forest
provide improved stability and generalization capability in academic datasets.
Various research studies have explored the use of machine learning techniques such as
Decision Trees, Random Forests, and Support Vector Machines in predicting student
performance. Comparative analysis shows ensemble methods like Random Forest
provide improved stability and generalization capability in academic datasets.
Various research studies have explored the use of machine learning techniques such as
Decision Trees, Random Forests, and Support Vector Machines in predicting student
performance. Comparative analysis shows ensemble methods like Random Forest
provide improved stability and generalization capability in academic datasets.
Various research studies have explored the use of machine learning techniques such as
Decision Trees, Random Forests, and Support Vector Machines in predicting student
performance. Comparative analysis shows ensemble methods like Random Forest
provide improved stability and generalization capability in academic datasets.
Various research studies have explored the use of machine learning techniques such as
Decision Trees, Random Forests, and Support Vector Machines in predicting student
performance. Comparative analysis shows ensemble methods like Random Forest
provide improved stability and generalization capability in academic datasets.
Various research studies have explored the use of machine learning techniques such as
Decision Trees, Random Forests, and Support Vector Machines in predicting student
performance. Comparative analysis shows ensemble methods like Random Forest
provide improved stability and generalization capability in academic datasets.
Various research studies have explored the use of machine learning techniques such as
Decision Trees, Random Forests, and Support Vector Machines in predicting student
performance. Comparative analysis shows ensemble methods like Random Forest
provide improved stability and generalization capability in academic datasets.
Various research studies have explored the use of machine learning techniques such as
Decision Trees, Random Forests, and Support Vector Machines in predicting student
performance. Comparative analysis shows ensemble methods like Random Forest
provide improved stability and generalization capability in academic datasets.
Various research studies have explored the use of machine learning techniques such as
Decision Trees, Random Forests, and Support Vector Machines in predicting student
performance. Comparative analysis shows ensemble methods like Random Forest
provide improved stability and generalization capability in academic datasets.
Various research studies have explored the use of machine learning techniques such as
Decision Trees, Random Forests, and Support Vector Machines in predicting student
performance. Comparative analysis shows ensemble methods like Random Forest
provide improved stability and generalization capability in academic datasets.
Various research studies have explored the use of machine learning techniques such as
Decision Trees, Random Forests, and Support Vector Machines in predicting student
performance. Comparative analysis shows ensemble methods like Random Forest
provide improved stability and generalization capability in academic datasets.
Various research studies have explored the use of machine learning techniques such as
Decision Trees, Random Forests, and Support Vector Machines in predicting student
performance. Comparative analysis shows ensemble methods like Random Forest
provide improved stability and generalization capability in academic datasets.
CHAPTER III - SYSTEM REQUIREMENTS SPECIFICATION
Functional Requirements:
- Dataset input
- Data preprocessing
- Model training
- Prediction generation
- Accuracy evaluation
Non-Functional Requirements include reliability, performance efficiency, and usability.
CHAPTER IV - SYSTEM DESIGN
System Architecture: User → Data Preprocessing → Model Training → Prediction →
Output
Modules include Data Collection, Preprocessing, Model Building, and Evaluation.
CHAPTER V - IMPLEMENTATION
The dataset was obtained from Kaggle Student Performance Dataset. The data was
cleaned to remove missing values. Features such as attendance, internal scores, and study
time were selected for model training. The dataset was split into 80% training and 20%
testing sets.
The dataset was obtained from Kaggle Student Performance Dataset. The data was
cleaned to remove missing values. Features such as attendance, internal scores, and study
time were selected for model training. The dataset was split into 80% training and 20%
testing sets.
The dataset was obtained from Kaggle Student Performance Dataset. The data was
cleaned to remove missing values. Features such as attendance, internal scores, and study
time were selected for model training. The dataset was split into 80% training and 20%
testing sets.
The dataset was obtained from Kaggle Student Performance Dataset. The data was
cleaned to remove missing values. Features such as attendance, internal scores, and study
time were selected for model training. The dataset was split into 80% training and 20%
testing sets.
The dataset was obtained from Kaggle Student Performance Dataset. The data was
cleaned to remove missing values. Features such as attendance, internal scores, and study
time were selected for model training. The dataset was split into 80% training and 20%
testing sets.
The dataset was obtained from Kaggle Student Performance Dataset. The data was
cleaned to remove missing values. Features such as attendance, internal scores, and study
time were selected for model training. The dataset was split into 80% training and 20%
testing sets.
The dataset was obtained from Kaggle Student Performance Dataset. The data was
cleaned to remove missing values. Features such as attendance, internal scores, and study
time were selected for model training. The dataset was split into 80% training and 20%
testing sets.
The dataset was obtained from Kaggle Student Performance Dataset. The data was
cleaned to remove missing values. Features such as attendance, internal scores, and study
time were selected for model training. The dataset was split into 80% training and 20%
testing sets.
The dataset was obtained from Kaggle Student Performance Dataset. The data was
cleaned to remove missing values. Features such as attendance, internal scores, and study
time were selected for model training. The dataset was split into 80% training and 20%
testing sets.
The dataset was obtained from Kaggle Student Performance Dataset. The data was
cleaned to remove missing values. Features such as attendance, internal scores, and study
time were selected for model training. The dataset was split into 80% training and 20%
testing sets.
The dataset was obtained from Kaggle Student Performance Dataset. The data was
cleaned to remove missing values. Features such as attendance, internal scores, and study
time were selected for model training. The dataset was split into 80% training and 20%
testing sets.
The dataset was obtained from Kaggle Student Performance Dataset. The data was
cleaned to remove missing values. Features such as attendance, internal scores, and study
time were selected for model training. The dataset was split into 80% training and 20%
testing sets.
The dataset was obtained from Kaggle Student Performance Dataset. The data was
cleaned to remove missing values. Features such as attendance, internal scores, and study
time were selected for model training. The dataset was split into 80% training and 20%
testing sets.
The dataset was obtained from Kaggle Student Performance Dataset. The data was
cleaned to remove missing values. Features such as attendance, internal scores, and study
time were selected for model training. The dataset was split into 80% training and 20%
testing sets.
The dataset was obtained from Kaggle Student Performance Dataset. The data was
cleaned to remove missing values. Features such as attendance, internal scores, and study
time were selected for model training. The dataset was split into 80% training and 20%
testing sets.
CHAPTER VI - TESTING
Unit testing was conducted for preprocessing and model modules. Integration testing
ensured correct data flow between modules. System testing validated prediction outputs.
CHAPTER VII - RESULTS AND ANALYSIS
The Random Forest model achieved 89% accuracy, while Decision Tree achieved 84%
accuracy.
The confusion matrix components are represented below.
CHAPTER VIII - CONCLUSION AND FUTURE WORK
The project successfully demonstrates the application of machine learning in predicting
student academic performance. Future work may include integration with institutional
databases and deployment as a web-based application.
REFERENCES
1. Han, J., Kamber, M. Data Mining Concepts and Techniques.
2. Bishop, C. Pattern Recognition and Machine Learning.
3. Scikit-learn Documentation.
4. Kaggle Dataset Repository.