OLA Driver Churn Prediction is an end-to-end Machine Learning project designed to predict driver attrition. By analyzing driver demographics, tenure, performance, and income data, this application identifies at-risk drivers and provides actionable insights to improve retention strategies.
The project utilizes Ensemble Learning techniques (Random Forest, Bagging, XGBoost, Gradient Boosting) to handle class imbalance and maximize prediction accuracy. The results are presented in a professional, interactive Streamlit Dashboard.
- Streamlit Profile - https://share.streamlit.io/user/ratnesh-181998
- Project Demo - https://ola-driver-churn-prediction-machine-learning-mmntzrjjgxbadbxd4.streamlit.app/
- Interactive Dashboard: Built with Streamlit, featuring a modern UI with dark mode and gradients.
- Comprehensive EDA: Visualizations for churn distribution, demographic analysis, and correlation heatmaps.
- Advanced Preprocessing: KNN Imputation, Feature Engineering (Rating/Income trends), and One-Hot Encoding.
- Ensemble Modeling: Implementation and comparison of Random Forest, Bagging, XGBoost, and Gradient Boosting.
- Model Evaluation: ROC-AUC curves, Precision-Recall curves, Confusion Matrices, and Feature Importance plots.
- Business Insights: Actionable recommendations based on data-driven findings.
The application is organized into intuitive tabs for a seamless user experience:
- Key Metrics: Displays total records, unique drivers, feature count, and time period.
- Raw Data: View the first 10 rows of the dataset.
- Data Types: Summary of column types and non-null counts.
- Distributions: Visualizations for Gender, Education, Age, Income, and City.
- Missing Values: Heatmap and bar chart to identify data gaps.
- Correlation: Heatmap showing relationships between numerical features.
- Problem Statement: Detailed explanation of the business challenge.
- Churn Analysis: Charts showing churn rates by Gender, Education, Rating, and Grade.
- Concepts Tested: Overview of Ensemble Learning and handling imbalanced data.
- Pipeline Steps: Visual guide to data cleaning, encoding, and scaling.
- Missing Data Map: Heatmap visualizing the
LastWorkingDate(churn indicator).
- Feature Engineering: Analysis of derived features like
Quarterly_Rating_IncreaseandIncome_Increase. - Categorical Analysis: Interactive bar charts comparing feature categories against churn.
- Importance: List of top features driving the model's predictions.
- Model Training: Code snippets and configuration for Random Forest, Bagging, XGBoost, and Gradient Boosting.
- Learning Curves: Visualization of training vs. validation performance.
- Precision-Recall: Curve demonstrating the trade-off for the best model.
- Performance Metrics: Comparative table of Accuracy, Precision, Recall, F1-Score, and ROC-AUC.
- ROC Curves: Comparison of ROC curves for all models.
- Confusion Matrices: Heatmaps showing true positives, false positives, etc.
- Key Findings: Bullet points summarizing critical discoveries (e.g., 2018-2019 cohort risk).
- Churn Trends: Visual analysis of churn over time/cohorts.
- Recommendations: Strategic business actions to reduce attrition.
- Interactive Q&A: Guided questions to explore the analysis findings.
- Quiz: Test your knowledge on the key drivers of churn.
- Full Walkthrough: Comprehensive, code-rich explanation of the entire project pipeline, from raw data to final model.
- Language: Python
- Frontend: Streamlit
- Libraries: Pandas, NumPy, Scikit-learn, XGBoost, Matplotlib, Seaborn, Category Encoders
- Tools: VS Code, Git
OLA-Driver-Churn-Prediction/
βββ app.py # Main Streamlit application
βββ ola_analysis.py # Core analysis and model training code
βββ ola_driver_scaler.csv # Dataset
βββ requirements.txt # Python dependencies
βββ README.md # Project documentation
βββ LICENSE # MIT License
βββ .gitignore # Git ignore file
βββ logs/ # Application logs
We compared multiple models to find the best predictor for driver churn.
| Model | Accuracy | Precision | Recall | F1-Score | ROC-AUC |
|---|---|---|---|---|---|
| Gradient Boosting | 89.1% | 0.929 | 0.912 | 0.920 | 0.945 |
| Bagging (DT) | 88.0% | 0.939 | 0.876 | 0.906 | 0.935 |
| XGBoost | 87.0% | 0.884 | 0.923 | 0.900 | 0.930 |
| Random Forest | 86.8% | 0.928 | 0.866 | 0.890 | 0.920 |
Winner: Gradient Boosting Classifier provided the best balance of Precision and Recall with the highest ROC-AUC score.
- Cohort Risk: Drivers who joined in 2018-2019 have significantly higher churn rates compared to newer joiners.
- Rating Impact: A quarterly rating of 1 is a critical warning sign, with churn rates nearing 70%.
- Income Stagnation: Drivers who did not receive an income increase during their tenure are highly likely to leave.
- Education: Drivers with lower education levels (10+, 12+) require more support and upskilling opportunities.
-
Clone the Repository
git clone https://github.com/Ratnesh-181998/OLA-Driver-Churn-Prediction-Machine-Learning.git cd OLA-Driver-Churn-Prediction-Machine-Learning -
Create a Virtual Environment (Optional)
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Dependencies
pip install -r requirements.txt
-
Run the App
streamlit run app.py
RATNESH SINGH
- π§ Email: rattudacsit2021gate@gmail.com
- πΌ LinkedIn: https://www.linkedin.com/in/ratneshkumar1998/
- π GitHub: https://github.com/Ratnesh-181998
- π± Phone: +91-947XXXXX46
- π Live Demo: Streamlit
- π Documentation: GitHub Wiki
- π Issue Tracker: GitHub Issues
This project is licensed under the MIT License - see the LICENSE file for details.
Licensed under the MIT License - Feel free to fork and build upon this innovation! π