Rainfall Prediction
Machine Learning Project
Aniket Dubey
Project Overview: Rainfall Prediction
Importance: Accurate rainfall prediction is crucial for agricultural planning, water management,
and disaster mitigation, especially in rain-dependent regions like Vidarbha.
Purpose: The purpose of our project is to use historical rainfall data to develop an advanced
machine learning model capable of predicting the average rainfall for a given period
Bene ciaries: Our model will help local communities, farmers, and policymakers make informed
decisions based on predicted rainfall patterns. The objective is not just to improve the accuracy
of predictions, but also to better understand the factors contributing to rainfall variability in this
region.
fi
Tech Stack
Python: Python was chosen for its readability, simplicity, and vast selection of scientific and numerical libraries.
Pandas and NumPy: Used for efficient data manipulation, analysis, and computations.
Scikit-learn: Provides a range of algorithms for machine learning.
Matplotlib, Seaborn: These aid in visualizing data patterns and trends.
Machine Learning Models (XGBoost, Logistic Regression, Random Forest, SVM, Gradient Boosting):
Offer diverse ways to tackle the problem, enhancing our Ensemble Model.
Flask: Allows running our model on a web server.
HTML/CSS: Enable building a user-friendly interface.
Pickle: Used to store and reuse our trained model.
mlxtend: Facilitates tasks like stacking multiple regressors.
Workflow
Results
PERFORMANCE METRIC FOR REGRESSION
Regression Algorithms Accuracy (%)
Random Forest 80.3
SVR 15.9
Gradient Boosting 81
XGBoost 73.9
PERFORMANCE METRIC FOR HYBRID MODELS
Hybrid Models Accuracy (%)
Linear reg + RF reg + SVR (meta model:XGB) 78.4
Linear reg + RF reg + SVR (meta model:Linear reg) 80.3
Linear reg + RF reg + Log reg (meta model:Linear reg) 80.6
Linear reg + RF reg + GB reg (meta model:Linear reg) 81.2
Rainfall-Prediction-Website
Challenges
Data Preprocessing: Transforming the data into a suitable format and handling missing values.
Data Cleaning: Handling missing, inconsistent, or outlier data can be challenging and time-consuming.
Feature Selection/Engineering: Determining which attributes of the data are most relevant to the problem at hand.
Model Selection: Choosing the right machine learning models for our ensemble model from a wide range of
options.
Over tting and Under tting: Ensuring the model is complex enough to learn from the data but not so complex
that it loses generalizability.
Robustness: Ensuring model resilience to rainfall pattern variability.
Determining the right combination of models for stacking was tricky and required several iterations.
Performance: Guaranteeing efficient model performance within acceptable time frames, given the complexity of
ensemble models.
fi
fi