0% found this document useful (0 votes)
7 views24 pages

Diabetes 1

The document discusses the development of an optimal K-Nearest Neighbor (OPT-KNN) model for predicting diabetes mellitus using electronic health data. It emphasizes the importance of machine learning in enhancing disease risk prediction and presents a methodology for implementing the model, including data preparation and classification stages. The effectiveness of the model is validated through experiments on real-world diabetes data, highlighting its potential in improving eHealth services.

Uploaded by

srinivas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views24 pages

Diabetes 1

The document discusses the development of an optimal K-Nearest Neighbor (OPT-KNN) model for predicting diabetes mellitus using electronic health data. It emphasizes the importance of machine learning in enhancing disease risk prediction and presents a methodology for implementing the model, including data preparation and classification stages. The effectiveness of the model is validated through experiments on real-world diabetes data, highlighting its potential in improving eHealth services.

Uploaded by

srinivas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

ABSTRACT

• Nowadays, eHealth service has become a booming area, which refers to computer-
based health care and information delivery to improve health service locally, regionally
and worldwide.

• An effective disease risk prediction model by analyzing electronic health data benefits
not only to care a patient but also to provide services through the corresponding data-
driven eHealth systems.

• In this paper, we particularly focus on predicting and analyzing diabetes mellitus, an


increasingly prevalent chronic disease that refers to a group of metabolic disorders
characterized by a high blood sugar level over a prolonged period of time.

• K-Nearest Neighbor (KNN) is one of the most popular and simplest machine learning
techniques to build such a disease risk prediction model utilizing relevant health data.

• In order to achieve our goal, we present an optimal K-Nearest Neighbor (OPT-KNN)


learning based prediction model based on patient’s habitual attributes in various
dimensions.

• This approach determines the optimal number of neighbors with low error rate for
providing better prediction outcome in the resultant model.

• The effectiveness of this machine learning eHealth model is examined by conducting


experiments on the real-world diabetes mellitus data collected from medical hospitals.
CONTENT

DESCRIPTION page

Title Page ...................................................................................................................... 1


Declaration & Certificates ............................................................................................ 2
Acknowledgement ........................................................................................................ 4
Abstract ......................................................................................................................... 5
List of Figures ............................................................................................................... 7
CHAPTER 1: INTRODUCTION ............................................................................ 8
1.1 Overview ................................................................................................................ 8
1.2 Objective & Scope ................................................................................................. 9
CHAPTER 2: LITERATURE SURVEY ............................................................... 10
2.1 Prediction using Classification Algorithms ............................................................ 10
2.2 Data Mining Classification Techniques .................................................................. 10
CHAPTER 3: METHODOLOGY .......................................................................... 12
3.1 Aim of the project & Existing System ................................................................... 12
3.2 Proposed System & System Architecture .............................................................. 13
3.3 System Requirements ............................................................................................. 14
3.4 Data Flow & Class Diagrams ................................................................................ 15
3.5 Software Environment (Python & Flask) .............................................................. 17
CHAPTER 4: RESULT AND PERFORMANCE EVALUATION ..................... 19
4.1 Result & Requirement Analysis ............................................................................. 19
4.2 Functional & Non-Functional Requirements ......................................................... 20
CHAPTER 5: TESTING ......................................................................................... 22
5.1 Input & Output Design ........................................................................................... 22
5.2 System Study & Feasibility .................................................................................... 23
5.3 Types of Tests ........................................................................................................ 24
CHAPTER 6: CONCLUSION ................................................................................ 25
REFERENCES ......................................................................................................... 26
FIGURE PAGE
N0. N0.
FIGURE NAMES

3.4 System Architecture 13

[Link] DFD-Level-0 15

[Link] DFD_Level-1 15

[Link] DFD-Level-2 16

3.7.3 Class Diagram 16

[Link] Login Html 18

[Link] Success Url 18

CHAPTER 1
INTRODUCTION
1.1 OVERVIEW

Diabetes is a set of metabolic problems characterized by high blood sugar levels over a
protracted period of time. It is mainly caused by abnormal insulin secretion and/or action.
Symptoms of high blood sugar incorporate excessive voiding, continually feeling thirsty, and
enlarged hunger. If not treated on time, diabetes will cause serious health problems in a
person like diabetic acidosis, hyperosmolar hyperglycemic state, or may result in death. This
could result in long-term complications including vascular issues, brain stroke, ulcers, and
eye complications.

Diabetes exists in 3 forms:

 Type-1 (IDDM): Characterized by the pancreas not generating enough insulin.


Patients need external insulin.

 Type-2 (NIDDM): Marked by the body resisting insulin. Often found in people with
high BMI or inactive lifestyles.

 Gestational Diabetes: Observed during pregnancy.

Generally, for a normal person, glucose levels vary from 70 to 99 mg/dL. An individual is
considered diabetic if the fasting glucose level is over 126 mg/dL. Over the years, it's been
found that people with high BMI, family history of diabetes, high cholesterol, and inactive
lifestyles face a larger risk against diabetes.

1.2 OBJECTIVE

• Diabetes mellitus (DM) is one of the most prevalent chronic non-communicable


diseases (NCD) around the world; about 90% of the patients who have diabetes suffer
from Type 2 DM (T2DM)

• The risk of developing T2DM is strongly associated with many predispositions,


behavioral, and environmental risk factors and also genetic factors
1.3 SCOPE/MOTIVATION

• About 90% of patients who have diabetes suffer from Type 2 DM (T2DM)

• Machine learning is the techniques are tools that can improve the analysis and
interpretation or extraction of knowledge from the data.

• These techniques may enhance the prognosis and diagnosis associated with reducing
diseases such as T2DM.

• We applied four classification models, including K-nearest neighbor (KNN)

CHAPTER 2

LITEARTURE SURVEY
2.1 Prediction of Diabetes using Classification Algorithms

Diabetes is a heterogeneous group of disorders which results in an increase of glucose


within the blood. Parameters used within the facts set to locate diabetes are Glucose, Blood
pressure, skin thickness, Insulin, and Age. Big data analytics examines the information
units and exhibits the hidden information. Using the Pima Indians Diabetes Database
(PIDD), the objective is to predict whether or not the patient has diabetes primarily based
on diagnostic measurements.

2.2 A Novel Technique to Predict Diabetic Disease Using Data Mining Classification
Techniques

Data mining is the process of analyzing data from different perspectives and summarizing it
into useful information. In this paper, various classification techniques are applied over the
diabetic mellitus dataset. The data is preprocessed and used to predict using classification
algorithms like Discriminant analysis, KNN, Naïve Bayes, and Support Vector Machine
(SVM) with linear and RBF kernel functions to show their accuracy.

2.3 Review on Prediction of Diabetes using Data Mining Technique

Diagnosis of diabetes is a tedious process, but with improvements in technology, predicting


the disease is made easier. The purpose is to diagnose whether the person is affected by
diabetes or not using the K Nearest Neighbor (KNN) classification technique. The training
data are classified by using the KNN classifier, which is highly efficient for both
classification and prediction

2.4 A Prediction Technique in Data Mining for Diabetes Mellitus

With the rapid development of machine learning, it has been applied to many aspects of
medical health. In this study, decision tree, random forest, and neural networks were used to
predict diabetes mellitus. Using hospital physical examination data in Luzhou, China, and
principal component analysis (PCA) to reduce dimensionality, the results showed that
prediction with a random forest could reach the highest accuracy (ACC = 0.8084)
2.5 Utilization of Data Mining Techniques for Diagnosis of Diabetes Mellitus - A Case
Study

Data mining looks through a large amount of data to extract useful information. The most
important and popular data mining techniques are classification, association, clustering,
prediction and sequential patterns. In health concern businesses, data mining plays an
important role in early prediction of diseases. In general to detect a disease numerous tests
must be conducted in a patient. The usage of data mining techniques in disease prediction is
to reduce the test and increase the accuracy of rate of detection. One of the most common
diseases among young adult is Diabetes mellitus. This develops at a middle age and more
common in obese children and adolescents. In order to reduce the population with diabetes
mellitus it should be detected at an earlier stage, hence a quick and efficient detection
mechanism has to be discovered. The principle of this study is to apply various data mining
techniques which are noteworthy to prediction of diabetes mellitus and extract hidden
patterns from the PIMA Indian diabetes dataset available at UCI Machine Learning
Repository.

CHAPTER 3

METHODOLGY
3.1 AIM OF THE PROJECT

 The primary aim of the present study was to implement models to predict DT2M
applying data mining techniques.

 Implementing data mining techniques for the prediction of the DT2M.

 Selecting the best model for the T2DM prediction.

 Comparing the performance of various data mining techniques for the algorithm's
implementation.

3.2 EXISTING SYSTEM

The existing system was taking in order to meets the demands of this system and solve the
problems of the existing system by implementing the naïve beyes classifier.

DISADVANTAGES OF EXISTING SYSTEM

● The system is not fully automated; it needs extensive manual data from the user for a
full diagnosis.

3.3 PROPOSED SYSTEM

The proposed diabetes prediction system has two main stages: data preparation and
classification. The input into the system is the dataset and the output will be one class
which represents "healthy" or "diabetic". We have applied the K-Nearest Neighbor (KNN)
algorithm on the training and test sample data and obtained results for different values of
K (number of nearest neighbors).

3.3.1 ADVANTAGES OF PROPOSED SYSTEM

● User can diagnose their diabetes and get instant result.

● K- Nearest neighbor Algorithm is a fast, highly scalable algorithm.


3.4 SYSTEM ARCHITECTURE

Fig no 3.4 :system Architecture


3.5 SYSTEM REQUIREMENTS :

3.5.1 HARDWARE REQUIREMENTS :


● System : Pentium Dual Core.
● Hard Disk : 120 GB.
● Monitor : 15’’ LED
● Input Devices : Keyboard, Mouse ● Ram : 4 GB.

3.5.2 SOFTWARE REQUIREMENTS:


● Operating system: Windows 7/10.
● Coding Language :Python
● Tool : python 3.6.1

3.6 MODULES DESCRIPTION

 Data Constraints: The system uses the global Pima Indian dataset for training the
model.

 Train and Test Dataset: Training data teaches the machine to do different actions
automatically. Testing data shows how the data is affected during module execution.

 Pre-processing of data: Converts basic data into a clean dataset by removing


unwanted data and filling in missing values so the machine can be trained easily.

 Feature Extraction: Alters key data to compute characteristics and decreases the
resources required to describe huge datasets, increasing the speed of supervised
learning.

 ML Algorithm (KNN): KNN is a non-parametric instance-based learning method. It


relies on the distance for object classification, dramatically improving accuracy based
on proximity to known classifications.
3.7 DATA FLOW DIAGRAM:

1. The DFD is also called as bubble chart. It is a simple graphical formalism that can be
used to represent a system in terms of input data to the system, various processing
carried out on this data, and the output data is generated by this system.
2. The data flow diagram (DFD) is one of the most important modeling tools. It is used to
model the system components. These components are the system process, the data used
by the process, an external entity that interacts with the system and the information
flows in the system.
3. DFD shows how the information moves through the system and how it is modified by
a series of transformations. It is a graphical technique that depicts information flow and
the transformations that are applied as data moves from input to output.
4. DFD is also known as bubble chart. A DFD may be used to represent a system at any
level of abstraction. DFD may be partitioned into levels that represent increasing
information flow and functional detail.

3.7 DATA FLOW DIAGRAMS


[Link] DFD-LEVEL-0

Fig no.[Link]:DFD-level-0

[Link] DFD-LEVEL-1

Fig no..[Link]:DFD-level-1
[Link] DFD-LEVEL-2

Fig no.[Link]:DFD-level-2

3.7.3 CLASS DIAGRAM

FIG NO.3.7.3:CLASS DIAGRAM


3.8 SOFTWARE ENVIRONMENT
3.8.1 Python:
Python is a high-level, interpreted, interactive, and object-oriented scripting language. Python
is designed to be highly readable. It is highly suitable for Machine Learning and Data
Science due to its broad standard library, portability, and extendable low-level modules.
3.8.2 Flask Framework

Flask is a web application framework written in Python, based on the Werkzeug WSGI
toolkit and Jinja2 template engine. It handles HTTP protocols for data communication over
the web. By default, Flask routes respond to GET requests, but POST methods can be
configured to securely send HTML form data to the server without caching.

Code Implementation Example:


To demonstrate the use of the POST method in URL routing, the following HTML and
Python scripts are utilized to pass user data into the prediction model.

Save the following script as [Link]


<html>

<body>

<formaction="[Link]

<p>Enter Name:</p>

<p><inputtype="text"name="nm"/></p>

<p><inputtype="submit"value="submit"/></p>

</form>

</body>

</html>
Now enter the following script in Python shell.
from flask importFlask, redirect,url_for, request
app=Flask(__name__)

@[Link]('/success/<name>')

def success(name):

return'welcome %s'% name

@[Link]('/login',methods=['POST','GET'])

def login():

[Link]=='POST':

user=[Link]['nm']

return redirect(url_for('success',name= user))

else:

user=[Link]('nm')

return redirect(url_for('success',name= user))

if __name__ =='__main__':

[Link](debug =True)

After the development server starts running, open [Link] in the browser, enter name in
the text field and click Submit.
Fig no.[Link]:login html
Form Form data is POSTed to the URL in the action clause of
the form tag. The browser then redirects and displays the output
message.

Fig no.[Link]:Success url


CHAPTER 4
RESULT AND PERFORMANCE EVALUATION
4.1 Result

After taking that input data from the system will able to divine the statistics by appeal the
ML algorithm & also provided the foremost output in the devise of different in between to
detection the most accurate to treatment to diabetes millets.

4.2 REQUIREMENT ANALYSIS


Requirement analysis is the process of determining user expectations for a modified
product. It encompasses the tasks that determine the need for analyzing, validating, and
managing software requirements. The requirements are actionable, measurable, testable, and
traceable to business needs.

4.3 FUNCTIONAL REQUIREMENTS


Functional requirements define the specific behaviors and functions the system must
support:
 Usability: Specifies how easy the system is to use. Queries can be entered in any
format, and the algorithm simulates the desired response seamlessly.
 Robustness: The ability of the program to perform well not only under ordinary
conditions but also under unusual conditions or irrelevant queries.
 Security: The state of providing protected access to resources. Unauthorized users
cannot access the system, thereby providing high security.
 Reliability: Ensures processes work correctly and completely without aborting. It is
capable of handling load and surviving without failure.
 Compatibility: Supported by all modern web browsers. Running on local/remote web
servers ensures a real-time experience.
 Flexibility: The system has the ability to run on different environments executed by
different users.
 Safety: Every query is processed in a secure manner without exposing personal
information.

4.4 NON- FUNCTIONAL REQUIREMENTS

Portability
It is the usability of the same software in different environments. The project can be run in
any operating system.
Performance
These requirements determine the resources required, time interval, throughput and
everything that deals with the performance of the system.
Accuracy
The result of the requested query is very accurate and high speed of retrieving information.
The degree of security provided by the system is high and effective.
Maintainability
Project is simple as further updates can be easily done without affecting its stability.
Maintainability basically defines how easy it is to maintain the system. It means that how
easy it is to maintain the system, analyze, change and test the application. Maintainability of
this project is simple as further updates can be easily done without affecting its stability.
CHAPTER 5
TESTING

SYSTEM DESIGN AND TESTING PLAN

5.1 INPUT DESIGN

The input design is the link between the information system and the user. It comprises the
developing specifications and procedures for data preparation, ensuring data is put into a
usable form for processing. The design controls the amount of input required, avoids
delays, and keeps the process simple while maintaining security and privacy. Input design
considers:

 What data should be given as input?

 How should the data be arranged or coded?

 The dialog to guide the operating personnel.

 Methods for preparing input validations.

5.2 OUTPUT DESIGN

A quality output meets the requirements of the end-user and presents the information
clearly. In output design, it is determined how the information is displaced for immediate
need. Efficient output design improves the system’s relationship to help user decision-
making, aiming to:

 Convey information about past activities, current status, or projections.

 Signal important events, opportunities, problems, or warnings.

 Trigger or confirm an action.


5.3 SYSTEM STUDY & FEASIBILITY

The feasibility of the project is analyzed to ensure the proposed system is not a burden to the
organization.

 Economic Feasibility: Conducted to check the economic impact. The system is well
within budget since most technologies used (Python, Flask, KNN) are open-source and
freely available.

 Technical Feasibility: Ensures the system does not have an excessively high demand
on technical resources. The developed ML model has modest hardware requirements.

 Social Feasibility: Checks the level of user acceptance. The system is designed to be
user-friendly so that users do not feel threatened by the technology but rather accept it
as a diagnostic necessity.

5.4 SYSTEM TESTING OVERVIEW


Testing discovers errors and conceivable faults in the software. It provides a way to check
the functionality of components to ensure the software meets user expectations and does
not fail unacceptably.
5.5 TYPES OF TESTS

 Unit Testing: Validates that internal program logic is functioning properly. We verified
that field entries are of the correct format, no duplicate entries are allowed, and all
routing links function correctly without delays.

 Integration Testing: Tests integrated software components to determine if they run


together as one program. We checked that the Python backend, Flask routing, and
HTML front-end interact without error.

 Functional Testing: Provides systematic demonstrations that functions exist as


specified. We validated valid/invalid inputs, algorithm execution, and output
generation.

 System Testing: Ensures the entire integrated software meets requirements,


emphasizing process links and integration points.

 White Box & Black Box Testing: Verified both the internal code structures (White
Box) and the external input/output mapping without viewing the internal code (Black
Box).

 Acceptance Testing: Verified that the end-user interface successfully accepts medical
parameters and returns the diabetes prediction accurately.

Test Results: All the test cases mentioned above passed successfully. No defects were
encountered during the execution of the Machine Learning prediction model or the web
interface.
CHAPTER 6
CONCLUSION
CONCLUSION:

There is a major concern among physicians regarding how to detect diabetes at its infancy
stage. This study has tried designing a system for predicting diabetes. The experimental work
implemented Machine Learning algorithms, and the evaluation was done on various
measures.

The experiment was carried out on the Diabetes dataset, and the results confirmed the
designed system had an accuracy of 79.17% using the classification formula. The system
designed using this ML algorithm can also be customized to predict other alternative
diseases. The research can be further enhanced by implementing other ML algorithms, or
deep learning frameworks, to further improve the prediction accuracy of diabetes in the
future.
REFERENCES

[1] Deepti Sisodiaa, Dilip Singh Sisodiab, 2018, Prediction of Diabetes using Classification Algorithms,
International Conference on Computational Intelligence and Data Science - ICCIDS 2018 ,Science Direct
Procedia Computer Science 132 (2018) 1578–1585.

[2] G. Krishnaveni*, T. Sudha,” A Novel Technique To Predict Diabetic Disease Using Data Mining
Classification Techniques” in International Conference on Innovative Applications in Engineering and
Information Technology (ICIAEIT2017), vol. 3, Issue 1, pp. 5-11, 2017.

[3] Vrushali B., and Rakhi W., “Review on Prediction of Diabetes using Data Mining Technique”,
International Journal of Research and Scientific Innovation (IJRSI), Volume IV, Issue IA, pp. 43-46, January
2017.

[4] Harleen and Dr. Pankaj B.,”A Prediction Technique in Data Mining for Diabetes Mellitus,” Journal of
Management Sciences and Technology, vol. 4, Issue 1, pp. 1-12, 2016.

[5] Thirumal P., and Nagarajan N.,” Utilization of Data Mining Techniques for Diagnosis of Diabetes
Mellitus - A Case Study”, ARPN Journal of Engineering and Applied Sciences, Vol. 10, No. 1, pp. 8-13,
January 2015.

[6] Iyer A., Jeyalatha S., Sumbaly R., “Diagnosis of diabetes using classification mining techniques,”
International Journal of Data Mining & Knowledge Management Process (IJDKP), vol. 5, no. 1, 2015.

[7] Perveen, S., Shahbaz, M.,Guergachi,A.,Keshavjee,K.,[Link] of Data


MiningClassificationTechniques to Predict Diabetes. Procedia Computer Science 82, 115–121.
doi:10.1016/[Link].2016.04.016.

[8] Orabi, K.M.,Kamal, Y.M.,Rabah,T.M.,[Link] SystemforDiabetes


MellitusDisease,in:IndustrialConference on Data Mining, Springer. Springer. pp. 420–427.

You might also like