Skip to content

Ratnesh-181998/OLA-Driver-Churn-Prediction-Machine-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš— OLA Driver Churn Prediction | Machine Learning Dashboard

Streamlit Python Machine Learning License: MIT

πŸ“Œ Project Overview

OLA Driver Churn Prediction is an end-to-end Machine Learning project designed to predict driver attrition. By analyzing driver demographics, tenure, performance, and income data, this application identifies at-risk drivers and provides actionable insights to improve retention strategies.

The project utilizes Ensemble Learning techniques (Random Forest, Bagging, XGBoost, Gradient Boosting) to handle class imbalance and maximize prediction accuracy. The results are presented in a professional, interactive Streamlit Dashboard.


🎬 Demo


πŸš€ Key Features

  • Interactive Dashboard: Built with Streamlit, featuring a modern UI with dark mode and gradients.
  • Comprehensive EDA: Visualizations for churn distribution, demographic analysis, and correlation heatmaps.
  • Advanced Preprocessing: KNN Imputation, Feature Engineering (Rating/Income trends), and One-Hot Encoding.
  • Ensemble Modeling: Implementation and comparison of Random Forest, Bagging, XGBoost, and Gradient Boosting.
  • Model Evaluation: ROC-AUC curves, Precision-Recall curves, Confusion Matrices, and Feature Importance plots.
  • Business Insights: Actionable recommendations based on data-driven findings.

�️ Streamlit UI Walkthrough

The application is organized into intuitive tabs for a seamless user experience:

1. πŸ“Š Data Overview

  • Key Metrics: Displays total records, unique drivers, feature count, and time period.
  • Raw Data: View the first 10 rows of the dataset.
  • Data Types: Summary of column types and non-null counts.
image image

2. πŸ” Exploratory Data Analysis (EDA)

  • Distributions: Visualizations for Gender, Education, Age, Income, and City.
  • Missing Values: Heatmap and bar chart to identify data gaps.
  • Correlation: Heatmap showing relationships between numerical features.
image image image image image image

3. πŸ“‹ Case Study

  • Problem Statement: Detailed explanation of the business challenge.
  • Churn Analysis: Charts showing churn rates by Gender, Education, Rating, and Grade.
  • Concepts Tested: Overview of Ensemble Learning and handling imbalanced data.
image image image image image

4. πŸ› οΈ Preprocessing

  • Pipeline Steps: Visual guide to data cleaning, encoding, and scaling.
  • Missing Data Map: Heatmap visualizing the LastWorkingDate (churn indicator).
image image

5. βš™οΈ Features

  • Feature Engineering: Analysis of derived features like Quarterly_Rating_Increase and Income_Increase.
  • Categorical Analysis: Interactive bar charts comparing feature categories against churn.
  • Importance: List of top features driving the model's predictions.
image image image image image

6. πŸ€– Models

  • Model Training: Code snippets and configuration for Random Forest, Bagging, XGBoost, and Gradient Boosting.
  • Learning Curves: Visualization of training vs. validation performance.
  • Precision-Recall: Curve demonstrating the trade-off for the best model.
image image image image image

7. πŸ“Š Evaluation

  • Performance Metrics: Comparative table of Accuracy, Precision, Recall, F1-Score, and ROC-AUC.
  • ROC Curves: Comparison of ROC curves for all models.
  • Confusion Matrices: Heatmaps showing true positives, false positives, etc.
image image image image image image

8. πŸ’‘ Insights

  • Key Findings: Bullet points summarizing critical discoveries (e.g., 2018-2019 cohort risk).
  • Churn Trends: Visual analysis of churn over time/cohorts.
  • Recommendations: Strategic business actions to reduce attrition.
image image image

9. ❓ Questionnaire

  • Interactive Q&A: Guided questions to explore the analysis findings.
  • Quiz: Test your knowledge on the key drivers of churn.
image image image image

10. πŸ“š Complete Analysis

  • Full Walkthrough: Comprehensive, code-rich explanation of the entire project pipeline, from raw data to final model.
image image image image image image image image

οΏ½πŸ› οΈ Tech Stack

  • Language: Python
  • Frontend: Streamlit
  • Libraries: Pandas, NumPy, Scikit-learn, XGBoost, Matplotlib, Seaborn, Category Encoders
  • Tools: VS Code, Git

πŸ“‚ Project Structure

OLA-Driver-Churn-Prediction/
β”œβ”€β”€ app.py                   # Main Streamlit application
β”œβ”€β”€ ola_analysis.py          # Core analysis and model training code
β”œβ”€β”€ ola_driver_scaler.csv    # Dataset
β”œβ”€β”€ requirements.txt         # Python dependencies
β”œβ”€β”€ README.md                # Project documentation
β”œβ”€β”€ LICENSE                  # MIT License
β”œβ”€β”€ .gitignore               # Git ignore file
└── logs/                    # Application logs

πŸ“Š Model Performance

We compared multiple models to find the best predictor for driver churn.

Model Accuracy Precision Recall F1-Score ROC-AUC
Gradient Boosting 89.1% 0.929 0.912 0.920 0.945
Bagging (DT) 88.0% 0.939 0.876 0.906 0.935
XGBoost 87.0% 0.884 0.923 0.900 0.930
Random Forest 86.8% 0.928 0.866 0.890 0.920

Winner: Gradient Boosting Classifier provided the best balance of Precision and Recall with the highest ROC-AUC score.


πŸ’‘ Key Business Insights

  1. Cohort Risk: Drivers who joined in 2018-2019 have significantly higher churn rates compared to newer joiners.
  2. Rating Impact: A quarterly rating of 1 is a critical warning sign, with churn rates nearing 70%.
  3. Income Stagnation: Drivers who did not receive an income increase during their tenure are highly likely to leave.
  4. Education: Drivers with lower education levels (10+, 12+) require more support and upskilling opportunities.

βš™οΈ Installation & Usage

  1. Clone the Repository

    git clone https://github.com/Ratnesh-181998/OLA-Driver-Churn-Prediction-Machine-Learning.git
    cd OLA-Driver-Churn-Prediction-Machine-Learning
  2. Create a Virtual Environment (Optional)

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install Dependencies

    pip install -r requirements.txt
  4. Run the App

    streamlit run app.py

🀝 Contact

RATNESH SINGH

Project Links


πŸ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ“œ License

License

Licensed under the MIT License - Feel free to fork and build upon this innovation! πŸš€


πŸ“ž CONTACT & NETWORKING πŸ“ž

πŸ’Ό Professional Networks

LinkedIn GitHub X Portfolio Email Medium Stack Overflow

πŸš€ AI/ML & Data Science

Streamlit HuggingFace Kaggle

πŸ’» Competitive Programming (Including all coding plateform's 5000+ Problems/Questions solved )

LeetCode HackerRank CodeChef Codeforces GeeksforGeeks HackerEarth InterviewBit


πŸ“Š GitHub Stats & Metrics πŸ“Š

Profile Views

GitHub Streak Stats


Typing SVG

Footer Typing SVG

About

Machine Learning project predicting OLA driver churn using Python and Ensemble Learning (XGBoost, Random Forest). Includes a deployed Streamlit dashboard for data visualization and actionable business insights.Tech-machine-learning, churn-prediction, streamlit, python, data-science, ensemble-learning, xgboost, predictive-analytics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors