0% found this document useful (0 votes)

4 views10 pages

Diabetes Classification Algorithm Analysis

This research paper presents a comparative analysis of various classification algorithms for predicting diabetes using the Pima Indians Diabetes Dataset. The study involves data processing, cleaning, feature exploration, model selection, and evaluation of classification accuracy. The preliminary results indicate that support vector machines, naive Bayes, and logistic regression achieve an accuracy of 75% in classifying diabetic and non-diabetic individuals.

Uploaded by

naveen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views10 pages

Diabetes Classification Algorithm Analysis

Uploaded by

naveen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

1.

Title of the paper

Title: Comparative analysis of classification algorithms for diabetic prediction
Diabetes means blood sugar is above desired level on a sustained basis. The prime
objective of this research work is to provide a better classification of diabetes. There are already
several existing method, which have been implemented for the classification of diabetes dataset.
In medical sector, the classifications systems have been widely used to exploit the patient’s data
and make the predictive models or build set of rules.
In this manuscript, we use the standard “Pima Indians Diabetes Dataset” provided by the UCI
machine learning repository. We process the data to check if there is a need for data cleaning. We
clean the data by replacing the value 0 with mean of the attributes, if any. We shall perform the
feature selection in order to increase the accuracy of the model. In the final stage, we evaluate
the model by splitting the data into training data and testing data. Performance of different
classification algorithms is studied by using the accuracy.

2. System Architecture

3. MODULES
1
Raw Data:
The quality of data, to a large extent affects the result of prediction. The accuracy
depends mainly on the data considered. In this, we will be using an existing data set called the
“Pima Indians Diabetes Dataset” provided by the UCI Machine Learning Repository. This is a
standard dataset that has drawn the values from the real instances.

Data Processing:
When encountered with a data set, first we analyze the data set. This step is necessary to
familiarize with the data, to gain some understanding about the potential features and to see if
data cleaning is needed.
Diabetes data set dimensions: (768, 9)
We can observe that the data set contain 768 rows and 9 columns. ‘Outcome’ is the column
which we are going to predict, which says if the patient is diabetic or not. 1 means the person is
diabetic and 0 means person is not.

Data Cleaning:
In the dataset we can see some missing values. Most of the inaccurate experimental
results were caused by these meaningless values. These values can be replaced by average of
values either mean, median, mode of the attribute.
For example, in the original dataset, the values 0, indicates that the real value was missing. To
reduce the influence of meaningless values, we used the means from the training data to replace
all missing values.

Feature Exploration:
It is the process of transforming the gathered data into features that better represent the
problem that we are trying to solve to the model, to improve its performance and accuracy.

2
It create more input features from the existing features and also combine several features to
produce more intuitive features to feed to the model.

Model selection:
Model selection or algorithm selection phase is the most exciting and the heart of
machine learning. It is the phase where we select the model which performs best for the data set
at hand.
First we will be calculating the “Classification Accuracy” of a given set of classification models
with their default parameters to determine which model performs better with the diabetes data
set.

Evaluating Methods:
We will be evaluating the model by splitting the data set into two portions : “training set”
and ”testing set”. The training set is used to train the model. And the testing set is used to test the
model. After being processed by classification algorithms, and evaluate the accuracy of the
model.

4. Status of implementation
We have completed 30% of the project by incorporating the predefined packages. In
future, we shall include the actual implementation of the algorithms and depict the accuracy of
the model.

5. Sample code

import numpy as np
import sklearn
import pandas as pd
import [Link] as plt
3
import os
print([Link]("dataset"))

df = pd.read_csv("dataset/[Link]")
[Link]()

import seaborn as sns

[Link]([Link])

df[['Glucose','BloodPressure','SkinThickness','Insulin','BMI','DiabetesPedigreeFunction','Age']] =
df[['Glucose','BloodPressure','SkinThickness','Insulin','BMI','DiabetesPedigreeFunction','Age']].replace(0,
[Link])
[Link]()

[Link]([Link](), inplace = True)

[Link]().sum()

[Link]([Link](),annot=True)
#[Link](df)
fig = [Link]()
fig.set_size_inches(8,8)

from [Link] import RandomForestClassifier

clf = RandomForestClassifier()
x=df[[Link][:8]]
y=[Link]
[Link](x,y)
feature_imp = [Link](clf.feature_importances_,index=[Link])
feature_imp.sort_values(by = 0 , ascending = False)

from sklearn.model_selection import train_test_split

features = df[["Glucose",'BMI','Age','DiabetesPedigreeFunction']]

4
labels = [Link]
[Link]()

features_train,features_test,labels_train,labels_test =
train_test_split(features,labels,stratify=[Link],test_size=0.4)

from [Link] import DecisionTreeClassifier

dtclf = DecisionTreeClassifier()
[Link](features_train,labels_train)
[Link](features_test,labels_test)

from sklearn import svm

clf = [Link](kernel="linear")
[Link](features_train,labels_train)
[Link](features_test,labels_test)

from sklearn import naive_bayes

nbclf = naive_bayes.GaussianNB()
[Link](features_train,labels_train)
[Link](features_test,labels_test)

from sklearn.linear_model import LogisticRegression

clf1 = LogisticRegression()
[Link](features_train,labels_train)
[Link](features_test,labels_test)

from [Link] import KNeighborsClassifier

knnclf = KNeighborsClassifier(n_neighbors=2)
[Link](features_train,labels_train)
print([Link](features_test,labels_test))

5
6
7
8
6. CONCLUSION
The main motto is “to prevent and cure diabetes and to improve the lives of all people affected
by diabetes”. To support the lives of the people all over the world, we are trying to detect and
prevent the complications of diabetes at the early stage through predictive analysis by improving
the classification techniques. Support vector machine and naive based techniques give the
accuracy of 75% and 75% respectively. Logistic regression also gives the accuracy of 75%.It
gives the best fit to data with respect to the diabetic and non-diabetic persons.

7. REFERENCES
1. Type 2 diabetes mellitus prediction model based on data mining.(IEEE paper)
2. [Link]

9
 preg: Number of times pregnant
 Plas: Plasma glucose concentration a 2 hours in an oral glucose tolerance test
 Pres: Diastolic blood pressure (mm Hg)
 Skin: Triceps skin fold thickness (mm)
 Test: 2-Hour serum insulin (mu U/ml)
 mass Body mass index (weight in kg/(height in m)^2)
 Pedi: Diabetes pedigree function
 Age: Age (years)
 Class: Class variable (0 or 1) **

Diabetes Prediction Using Logistic Regression
No ratings yet
Diabetes Prediction Using Logistic Regression
9 pages
Diabetes Prediction with Machine Learning
No ratings yet
Diabetes Prediction with Machine Learning
19 pages
Diabetes Prediction Using ML Techniques
No ratings yet
Diabetes Prediction Using ML Techniques
14 pages
Diabetes Prediction Model Using ML
No ratings yet
Diabetes Prediction Model Using ML
62 pages
Diabetes Prediction Model Comparison
No ratings yet
Diabetes Prediction Model Comparison
8 pages
ML Project
No ratings yet
ML Project
15 pages
Diabetes Prediction Using ML Models
No ratings yet
Diabetes Prediction Using ML Models
40 pages
Diabetes Prediction with Machine Learning
No ratings yet
Diabetes Prediction with Machine Learning
13 pages
Diabetes Prediction Model Analysis
No ratings yet
Diabetes Prediction Model Analysis
20 pages
Diabetes Onset Prediction with ML
No ratings yet
Diabetes Onset Prediction with ML
4 pages
Diabetes Prediction Using ML Model
No ratings yet
Diabetes Prediction Using ML Model
7 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
6 pages
Diabetes Risk Prediction Using AI
No ratings yet
Diabetes Risk Prediction Using AI
5 pages
Diabetes Prediction Model Using ML
No ratings yet
Diabetes Prediction Model Using ML
31 pages
Diabetes Prediction with Machine Learning
No ratings yet
Diabetes Prediction with Machine Learning
29 pages
ML File Harsh
No ratings yet
ML File Harsh
16 pages
Diabetes Prediction with MLP Neural Network
No ratings yet
Diabetes Prediction with MLP Neural Network
41 pages
Diabetes Diagnosis with Machine Learning
No ratings yet
Diabetes Diagnosis with Machine Learning
7 pages
Normalizing Data for Diabetes Prediction
No ratings yet
Normalizing Data for Diabetes Prediction
17 pages
Diabetes Prediction Using SVM Model
No ratings yet
Diabetes Prediction Using SVM Model
10 pages
Diabetes Prediction Using ML Techniques
No ratings yet
Diabetes Prediction Using ML Techniques
5 pages
Diabetic Risk Estimation Using SVM
No ratings yet
Diabetic Risk Estimation Using SVM
26 pages
Machine Learning for Diabetes Prediction
No ratings yet
Machine Learning for Diabetes Prediction
19 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
15 pages
Machine Learning for Diabetes Prediction
No ratings yet
Machine Learning for Diabetes Prediction
24 pages
Diabetes Prediction with ML Techniques
No ratings yet
Diabetes Prediction with ML Techniques
13 pages
Aayush Bcom (Hons) 026
No ratings yet
Aayush Bcom (Hons) 026
12 pages
Diabetes Prediction with Machine Learning
No ratings yet
Diabetes Prediction with Machine Learning
4 pages
SVM Analysis for Diabetes Prediction
No ratings yet
SVM Analysis for Diabetes Prediction
13 pages
Diabetes Risk Prediction with AI
No ratings yet
Diabetes Risk Prediction with AI
14 pages
Diabetes Prediction with Naïve Bayes Model
No ratings yet
Diabetes Prediction with Naïve Bayes Model
20 pages
Diabetes Pridiction Using Machine Learning
No ratings yet
Diabetes Pridiction Using Machine Learning
31 pages
Pima Indian Diabetes: "Data Mining With R: Predict Diabetes,"
No ratings yet
Pima Indian Diabetes: "Data Mining With R: Predict Diabetes,"
22 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
10 pages
Diabetes Prediction Using Data Science
No ratings yet
Diabetes Prediction Using Data Science
13 pages
Diabetes Prediction with Machine Learning
No ratings yet
Diabetes Prediction with Machine Learning
44 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
52 pages
ADS Report Final
No ratings yet
ADS Report Final
14 pages
Diabetes Prediction Using SVM Model
No ratings yet
Diabetes Prediction Using SVM Model
16 pages
Diabetes Classification Analysis Report
No ratings yet
Diabetes Classification Analysis Report
14 pages
Final Diabetes Prediction Model Report
No ratings yet
Final Diabetes Prediction Model Report
50 pages
Sample Project Report
No ratings yet
Sample Project Report
70 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
13 pages
Confusion Matrix for Diabetes Prediction
No ratings yet
Confusion Matrix for Diabetes Prediction
3 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
19 pages
Diabetes Prediction Using Ensemble Classifier
No ratings yet
Diabetes Prediction Using Ensemble Classifier
10 pages
Diabetes Prediction with Machine Learning
No ratings yet
Diabetes Prediction with Machine Learning
15 pages
Diabetes Prediction with ML Models
No ratings yet
Diabetes Prediction with ML Models
4 pages
Machine Learning for Diabetes Diagnosis
No ratings yet
Machine Learning for Diabetes Diagnosis
5 pages
Machine Learning for Diabetes Prediction
No ratings yet
Machine Learning for Diabetes Prediction
15 pages
ML Techniques for Diabetes Prediction
No ratings yet
ML Techniques for Diabetes Prediction
10 pages
Diabetes Prediction Using Data Mining Techniques
No ratings yet
Diabetes Prediction Using Data Mining Techniques
10 pages
AI Diabetes Prediction Using ML Techniques
No ratings yet
AI Diabetes Prediction Using ML Techniques
9 pages
Diabetes Prediction System Using ML
No ratings yet
Diabetes Prediction System Using ML
6 pages
Diabetes Prediction with Machine Learning
No ratings yet
Diabetes Prediction with Machine Learning
21 pages
Ijaz Ahmed Ai Assignment
No ratings yet
Ijaz Ahmed Ai Assignment
5 pages
Logistic Regression for Diabetes Risk
No ratings yet
Logistic Regression for Diabetes Risk
9 pages
Linear SVC Analysis on Diabetes Data
No ratings yet
Linear SVC Analysis on Diabetes Data
3 pages
Machine Learning for Diabetes Prediction
No ratings yet
Machine Learning for Diabetes Prediction
13 pages
Factors Behind Atacama Desert's Aridity
No ratings yet
Factors Behind Atacama Desert's Aridity
3 pages
Recombinant DNA Technology in Hepatitis B Vaccine
No ratings yet
Recombinant DNA Technology in Hepatitis B Vaccine
2 pages
International Nuclear Policies Overview
No ratings yet
International Nuclear Policies Overview
2 pages
Applications of Biotechnology Explained
No ratings yet
Applications of Biotechnology Explained
2 pages
Decree vs Order: Key Legal Differences
No ratings yet
Decree vs Order: Key Legal Differences
1 page
Key Updates on India's Environmental and Legal Issues
No ratings yet
Key Updates on India's Environmental and Legal Issues
23 pages
DWCRA's Impact on Women's Empowerment in AP
No ratings yet
DWCRA's Impact on Women's Empowerment in AP
9 pages
Impact of Andhra Pradesh Bifurcation
No ratings yet
Impact of Andhra Pradesh Bifurcation
13 pages
Hollysys Automation Projects Overview
No ratings yet
Hollysys Automation Projects Overview
10 pages
Deploying Arista vEOS-lab on Hypervisors
No ratings yet
Deploying Arista vEOS-lab on Hypervisors
10 pages
STS Trust Chain Setup Guide
No ratings yet
STS Trust Chain Setup Guide
104 pages
Lecture 02 - 2023
No ratings yet
Lecture 02 - 2023
59 pages
SIMATIC S7-300 PLC Overview
100% (1)
SIMATIC S7-300 PLC Overview
29 pages
Limitations of Regression Analysis
No ratings yet
Limitations of Regression Analysis
25 pages
Driver Info for Intel UHD Graphics
No ratings yet
Driver Info for Intel UHD Graphics
41 pages
Cloud Storage Assignment Guide
0% (1)
Cloud Storage Assignment Guide
3 pages
Data Structures and Algorithms Overview
No ratings yet
Data Structures and Algorithms Overview
16 pages
ICT101 Spring 2025 Exam Guidelines
No ratings yet
ICT101 Spring 2025 Exam Guidelines
9 pages
EFB Performance Calculation Methods
No ratings yet
EFB Performance Calculation Methods
17 pages
TeamMate Analytics R11 Upgrade Guide
No ratings yet
TeamMate Analytics R11 Upgrade Guide
44 pages
Network Engineer Resume - Sangeet Pradhan
No ratings yet
Network Engineer Resume - Sangeet Pradhan
1 page
US Mobile eSIM Activation Guide
No ratings yet
US Mobile eSIM Activation Guide
1 page
Anatomic Pathology Laboratory Information Systems
No ratings yet
Anatomic Pathology Laboratory Information Systems
8 pages
Crop Survey Digitization Report 2025
No ratings yet
Crop Survey Digitization Report 2025
1 page
TD730 TD830 System Reference Manual 197704
No ratings yet
TD730 TD830 System Reference Manual 197704
105 pages
Project Initiation, Project Management & Requirements Determination
No ratings yet
Project Initiation, Project Management & Requirements Determination
21 pages
Introduction to SAP Fiori UX
No ratings yet
Introduction to SAP Fiori UX
27 pages
Dispatch Service Request Form
No ratings yet
Dispatch Service Request Form
2 pages
Multimedia Communications MCQs Guide
No ratings yet
Multimedia Communications MCQs Guide
7 pages
Kruskal's Algorithm in C Programming
No ratings yet
Kruskal's Algorithm in C Programming
5 pages
Free Tableau Certi Cation Exam Practice Test: Explanations
100% (1)
Free Tableau Certi Cation Exam Practice Test: Explanations
11 pages
High-Speed TFT LCD Defect Detection System
No ratings yet
High-Speed TFT LCD Defect Detection System
10 pages
Linux Disk Group Privilege Escalation
No ratings yet
Linux Disk Group Privilege Escalation
14 pages
Student-Teacher Ratios and ICT Use
No ratings yet
Student-Teacher Ratios and ICT Use
234 pages
EPLC Business Case Template Guide
No ratings yet
EPLC Business Case Template Guide
12 pages
Shortcuts For Apple Pages (MacOS)
No ratings yet
Shortcuts For Apple Pages (MacOS)
12 pages
HRMS Rollback and Element Creation Guide
100% (1)
HRMS Rollback and Element Creation Guide
16 pages
Computer Practice Lab Manual for First Year
No ratings yet
Computer Practice Lab Manual for First Year
77 pages

Diabetes Classification Algorithm Analysis

Uploaded by

Diabetes Classification Algorithm Analysis

Uploaded by

1.

Title of the paper

import seaborn as sns

[Link]([Link](), inplace = True)

from [Link] import RandomForestClassifier

from sklearn.model_selection import train_test_split

from [Link] import DecisionTreeClassifier

from sklearn import svm

from sklearn import naive_bayes

from sklearn.linear_model import LogisticRegression

from [Link] import KNeighborsClassifier

You might also like