0% found this document useful (0 votes)
9 views13 pages

Diabetes Prediction with Machine Learning

The document outlines a project aimed at developing a machine learning model to predict diabetes early using non-invasive data inputs, addressing gaps in current diagnostic methods. It details the methodology, including data preprocessing, model training, and evaluation using various algorithms, as well as challenges faced and future scope for improvement. The project utilizes the PIMA Indians Diabetes dataset and employs tools like Python, Scikit-learn, and various machine learning models to achieve its objectives.

Uploaded by

shashwatadatta5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views13 pages

Diabetes Prediction with Machine Learning

The document outlines a project aimed at developing a machine learning model to predict diabetes early using non-invasive data inputs, addressing gaps in current diagnostic methods. It details the methodology, including data preprocessing, model training, and evaluation using various algorithms, as well as challenges faced and future scope for improvement. The project utilizes the PIMA Indians Diabetes dataset and employs tools like Python, Scikit-learn, and various machine learning models to achieve its objectives.

Uploaded by

shashwatadatta5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Diabetes Prediction

Using Machine
03

Learning
01
AI-Mini-5
Individual contributions

Shashwata Suraj Munshi


01 02
Datta(LEADER) Project Report , Coder
Coder , Project Report

Tridib Mondal
Ankit Dutta: 04
03 Project Report , Coder
Power Point Designer

05
Sukreet Biswas
Power Point Designer
& Data collection

02
Problem Statement
The Challenge:
01
Diabetes is a chronic illness affecting millions
globally, often remaining undetected until
severe complications arise.

Current Gaps:
02
Traditional diagnostic methods can be time-
consuming and invasive.
Lack of accessible tools for early detection,
especially in underserved areas.

Our Aim:
03
To bridge this gap by developing a machine
learning model that predicts the likelihood of
diabetes early, using non-invasive data inputs.

03
Objectives:
Primary Goal:
01
Develop a machine learning-based model for early and
accurate prediction of diabetes.

Specific Objectives:
02
Utilize available datasets to train and test the model for high
accuracy.
Optimize the model for speed and reliability in predictions.
Create a user-friendly framework for real-world applications.

03
Impact:
Enable proactive healthcare interventions and reduce the burden
of diabetes-related complications.

04
Methodology / System Design
Used the PIMA Indians Diabetes dataset containing features like glucose,
BMI, age, insulin level, etc.

Data Preprocessing: Train-Test Split:


01 02
Standardized the features using Used train_test_split to
StandardScaler to bring them to divide the dataset into
a common scale. training and test sets.
Split the data into features (X)
and labels (Y).

Model Training: Model Evaluation:


03 04
Evaluated using accuracy
Used Support Vector Machine score on both training and
(SVM) classifier from sklearn. test data.

Prediction System:
05
Created a predictive
function to classify new
data points (likely in later
notebook cells).
05
Tools & Technologies Used
01 Programming Environment
Language: Python – Versatile and widely used for machine learning tasks.
Development Tools:
Jupyter Notebook: For interactive coding and data visualization.
Google Colab: For cloud-based development and leveraging GPU/TPU
support.

02 Data Resources
Dataset:
Kaggle’s Diabetes Dataset with 2000 records and 9 features for predicting
diabetic outcomes.
Data Preprocessing:
Feature scaling, normalization, and train-test splitting using Scikit-learn’s
utilities.
Ensemble & Optimization Techniques
Ensemble Methods:
Random Forest and AdaBoost for better generalization and improved
accuracy.
06
03 Libraries & Frameworks

Data Handling & Preprocessing


NumPy: For numerical operations and visualizations like
SVM decision boundaries.
Pandas: Managing datasets and creating DataFrames.
Scikit-learn (StandardScaler): Standardizing and scaling
feature data.
Visualization
Matplotlib: Core plotting library for graphs, ROC curves,
and visualizations.
Seaborn: Statistical plots, including bar charts for
feature importance and accuracy.
Model Training & Implementation
Logistic Regression, SVM, KNN, Naive Bayes, Decision
Tree, Random Forest, AdaBoost: Machine learning
models used for classification tasks.
Evaluation Metrics
Scikit-learn Metrics: Accuracy, classification reports, and
ROC/AUC for performance evaluation.

07
Results or Demo
Model Performance Metrics:

Algorithms Training Accuracy Test Accuracy

Logistic Regression 78.50% 75.97%

K-Nearest Neighours 82.90% 72.08%

SVM 82.90% 72.73%

Navive Bayes 75.57% 77.27%

Decision Tree 100.00% 70.78%

Random Forest 100.00% 75.97%

AdaBoost 80.46% 72.73%

08
Feature Importance:

Features like Glucose Level, BMI, and Age have significant importance
in prediction (based on Random Forest feature importance plot).
09
Visual Representations:

Feature Importance Plot:


Demonstrates the weight of each
feature in prediction.
ROC Curve: Showcases the True
Positive Rate vs. False Positive
Rate across different thresholds.
Confusion Matrix: Highlights the
distribution of true positives, true
negatives, false positives, and
false negatives.

10
Challenges Faced
Data Challenges: Resource Limitations:
01 03
Incomplete or inconsistent data Limited computational power for
in the dataset. training large models.
Imbalanced dataset, leading to Challenges in deploying the
biased predictions. model for real-world usage.

Model Challenges: Interpretability:


02 03
Difficulty in selecting the optimal
Ensuring the model's predictions
algorithm for the problem.
are understandable to non-
Balancing overfitting and
technical stakeholders.
underfitting during training.

11
Future Scope:
Enhancing Model Accuracy:
01
Incorporate larger and more diverse
datasets to improve generalization.
Explore advanced techniques like deep
learning for better prediction
performance.

Real-Time Predictions:
02
Integrate real-time data inputs from
wearable devices or IoT sensors.

Broader Health Insights:


03
Extend the model to predict related
conditions like hypertension or
cardiovascular risks.

Collaboration:
04
Work with healthcare professionals to
refine the system for clinical
applications.

12
References
Debadri Dutta, Debpriyo Paul, Tejas N. Joshi, Prof. Pramila M. Chawan,
01 04 "Diabetes Prediction Using Machine
Parthajeet Ghosh, "Analyzing Feature
Importance’s for Diabetes Prediction Learning Techniques".Int. Journal of
using Machine Learning". IEEE, pp 942- Engineering Research and Application,
928, 2018. Vol. 8, Issue 1, (Part -II) January 2018,
pp.-09-13
[Link], [Link], [Link],
02 Nonso Nnamoko, Abir Hussain, David
[Link] Caroline, "Random Forest 05 England, "Predicting Diabetes Onset: An
Algorithm for the Prediction of
Ensemble Supervised Learning Approach
Diabetes ". Proceeding of International
". IEEE Congress on Evolutionary
Conference on Systems Computation
Computation (CEC), 2018.
Automation and Networking, 2019.

Deeraj Shetty, Kishor Rit, Sohail Shaikh,


Nahla B., Andrew et al, "Intelligible 06
03 Nikita Patil, "Diabetes Disease
support vector machines for diagnosis
Prediction Using Data Mining ".
of diabetes mellitus. Information
International Conference on
Technology in Biomedicine", IEEE
Innovations in Information, Embedded
Transactions. 14, (July. 2010), 1114-20.
and Communication Systems (ICIIECS),
2017.

13

You might also like