HEART DISEASES PREDICTION SYSTEM USING MACHINE LEARNING ALGORITHM
ABSTRACT
Cardiovascular diseases continue to be a leading cause of death around the world, reinforcing the need for
accurate predictive models to allow early diagnosis and intervention. In this manuscript, We described a
detailed assessment of a heart disease prediction system using machine learning algorithms. I have used
algorithms of machine learning such as logistic regression to predict and classify the patients as have heart
disease or not. The effectiveness of the provided model was quite satisfactory and was able to predict
evidence of having heart disease in a specific individual by Logistic Regression along with good accuracy.
This system could improve medical care and reduce costs. In total, our results demonstrate the promise of
machine learning algorithms for the early detection and risk stratification of heart diseases, thus allowing
healthcare professionals to have a valuable decision support tool. Keywords: Heart diseases prediction,
Machine Learning, Cardiovascular health, Predictive Modeling, Classification Algorithm, Logistic
Regression.
Introduction
"Manipulating and extracting implicit, previously unknown/known, and potentially useful information about
data" is what machine learning is all about [1]. The area of machine learning is extremely broad and
diversified, and its application and breadth are growing daily. Several supervised learning classifiers are
incorporated into machine learning, which is utilized to find the accuracy and forecast the provided dataset.
We can apply that information to our HDPS project, as it will be very beneficial to many people. These days,
cardiovascular disorders, which encompass a variety of ailments that may impact your heart, are highly
prevalent. According to estimates from the World Health Organization, 17.9 million people worldwide die
from cardiovascular illnesses (CVDs) [2]. It is the main cause of adult fatalities. With the aid of their medical
history, our project can assist in predicting who is most likely to receive a heart disease diagnosis [3]. It
identifies anyone exhibiting heart disease signs, such as elevated blood pressure or chest pain, and assists in
identifying the condition with fewer medical tests and more efficient therapies, allowing for appropriate
treatment. The primary focus of this project is logistic regression. Our project's accuracy is good. Thus, the
accuracy and efficiency of the HDPS were raised by employing more data mining techniques. Supervised
learning includes the use of logistic regression. Logistic regression uses only discrete values. This study aims
to determine whether the patient's medical characteristics—such as age, gender, chest pain, fasting blood
sugar level, etc.—will likely lead to a diagnosis of any cardiovascular heart illnesses. A dataset including the
medical history and attributes of the patient is chosen from the UCI repository. We can determine whether or
not the patient has a heart condition by analyzing this dataset. In order to make this prediction, we classify a
patient based on 14 medical criteria to determine the likelihood that they will have a heart condition. Logistic
regression is used to learn these medical characteristics. Lastly, we categorize patients based on whether they
are at risk of developing heart disease or not. This approach is also quite economical.
Related Work
The prediction of heart disease is a well-researched area that has seen the application of several machine
learning techniques to create useful predictive models. Logistic regression has come to dominate this area of
research and prediction, primarily for its simplicity and interpretability. D'Agostino et al. (2008) showed the
utility of logistic regression in assessing individual cardiovascular risk via a model derived from clinical risk
factors. Their model resulted in an ease of use for clinical estimation of an individual's risk of an event
occurring over a predetermined time. More recently, (2007) created the Reynolds Risk Score, which is based
on logistic regression and includes major clinical risk factors as well as inflammatory biomarkers to better
predict cardiovascular risk in women. The predictive accuracy of their model was significantly improved by
including more varied predictors compared with traditional risk assessment. One of the major advantages of
logistic regression in predicting heart disease is its ability to model both categorical and continuous
predictors while providing a measure of interpretability by showing the relative importance of each risk
factor for the appropriate clinical flows in patient management and decision-making. Despite the advantages
of logistic regression, there are drawbacks that would limit its application in heart disease prediction (multi
collinearity, non-linear relationships, and interactions of predictors) necessitating appropriate preprocessing
and feature engineering to have a robust model with predictive accuracy.
Objectives
Identify persons who are at high risk for developing heart disease.
Facilitate early intervention to prevent heart disease.
Individualize treatment plans based on predicted risk.
Facilitate targeted prevention strategies for populations at high risk.
Maximize the use of healthcare resources by promoting maximum effect.
Research Methodology
In order to help practitioners or medical analysts identify heart disease effectively, this research analyses
machine learning techniques. Specifically, it uses the logistic regression algorithm. Examining journals,
published papers, and data on cardiovascular illness from recent times are all part of this process. A
framework for the suggested model is provided by the methodology [4]. The methodology is a set of
procedures that converts provided data into identifiable data patterns for consumers to understand. The
suggested methodology (Figure 1) is broken down into steps: data collection is the first stage, significant
values are extracted in the second stage, and data exploration takes place in the third stage of preparation.
Depending on the procedures used, data preparation addresses missing values, cleans up the data, and
normalizes the data [5]. Following data pre-processing, a classifier is employedto categorise the pre
processed data; in the suggested model, this classifier is called logistic regression. Lastly, we implemented
the suggested model and used a variety of performance indicators to assess our model's performance and
correctness. Using several classifiers, an efficient Heart Disease Prediction System (EHDPS) has been
constructed in this model. For prediction, this model makes use of 13 medical characteristics, including age,
sex, blood pressure, cholesterol, chest pain, and fasting sugar [6].
Fig .1 Proposed Model
Data Source
A group of people with an organised dataset were chosen based on several medical disorders and their history
of heart issues [2]. Heart disease is the umbrella term for a variety of illnesses that affect the heart. The World
Health Organisation (WHO) reports that cardiovascular illnesses account for the majority of deaths among
middle-aged adults. We use a data set that contains the medical records of 304 distinct patients across a range
of age groups. This dataset provides the vital information we need to identify patients who have been
diagnosed with heart disease or not, including age, resting blood pressure, fasting blood sugar level, and other
medical characteristics. Thirteen medical characteristics of 304 patients are included in this dataset, which
aids in determining whether a patient is at danger of developing heart disease and in differentiating between
those who are and are not. The UCI repository is the source of this dataset on heart disease. This dataset is
used to extract the pattern that results in the identification of patients who are at risk of developing heart
disease. Training and Testing are the two sections that make up these records. There are 14 columns and 304
rows in this dataset, with each row representing a single entry as shown in Table1.
Result and Discussion
These findings demonstrate that, despite the fact that the majority of researchers employ several algorithms,
including SVC and Decision Tree, to identify patients with heart disease, KNN, Random Forest Classifier,
and Logistic Regression provide superior results and outperform them [7]. Compared to the algorithms
utilised by earlier studies, ours is faster, more accurate, and saves a significant amount of money.
Furthermore, the highest accuracy of 88.5% that was achieved by Logistic Regression is higher than or
almost equal to the accuracy of earlier studies. In summary, the utilisation of more medical variables from the
dataset we retrieved has enhanced our accuracy. Additionally, our experiment informs us that patients with
heart disease can be predicted using logistic regression. This demonstrates why heart disease can be
diagnosed more accurately using logistic regression. The plot of the number of patients who have been
divided and predicted by the classifier based on age group, resting blood pressure, sex, and chest pain is
displayed in the following figures 2 through 5:
Heart Disease
No Heart Disease
Conclusion
With the use of ML classification modelling techniques like logistic regression, a model for the detection of
cardiovascular disease has been constructed. By extracting the patient medical history that causes a deadly
heart illness from a dataset containing the patient's medical history, including chest pain, blood pressure,
sugar levels, and other conditions, this method predicts who will have cardiovascular disease. Based on the
patient's clinical data, this heart disease detection system offers assistance if the patient has already received a
heart disease diagnosis. The proposed model was constructed using the logistic regression technique [9]. Our
model has an accuracy of 87.5%. Increased training data ensures that the model has a better chance of
correctly predicting whether or not a given individual has heart disease [10]. These computer-aided tools
allow us to anticipate patients more accurately and quickly while also significantly lowering costs. Numerous
medical databases are available for our use, and since machine learning approaches outperform human
prediction, both patients and physicians benefit from them. Consequently, by cleaning the dataset and using
logistic regression, this research lets us predict the patients who are diagnosed with heart problems. Our
model's accuracy of an average of 87.5% is better than that of the prior models, which had an accuracy of
85%.
Reference
[1] Soni J, Ansari U, Sharma D & Soni S (2011). Predictive data mining for medical diagnosis: an overview
of heart disease prediction. International Journal of Computer Applications, 17(8), 43-8
[2] Dangare C S & Apte S S (2012). Improved study of heart disease prediction system using data mining
classification techniques. International Journal of Computer Applications, 47(10), 44- 8.
[3] Jee S H, Jang Y, Oh D J, Oh B H, Lee S H, Park S W & Yun Y D (2014). A coronary heart disease
prediction model: the Korean Heart Study. BMJ open, 4(5), e005025.
[4] Wolgast G, Ehrenborg C, Israelsson A, Helander J, Johansson E & Manefjord H (2016). Wireless body
area network for heart attack detection [Education Corner]. IEEE antennas and propagation magazine, 58(5),
84-92.
[5] Zhang Y, Fogoros R, Thompson J, Kenknight B H, Pederson M J, Patangay A & Mazar S T (2011). U.S.
Patent No. 8,014,863. Washington, DC: U.S. Patent and Trademark Office.
[6] Buechler K F & McPherson P H (1999). U.S. Patent No. 5,947,124. Washington, DC: U.S. Patent and
Trademark Office.
[7] Folsom A R, Prineas R J, Kaye S A & Soler J T (1989). Body fat distribution and self-reported prevalence
of hypertension, heart attack, and other heart disease in older women. International journal of epidemiology,
18(2), 361-7.
[8] Piller L B, Davis B R, Cutler J A, Cushman W C, Wright J T, Williamson J D & Haywood L J (2002).
Validation of heart failure events in the Antihypertensive and Lipid Lowering Treatment to Prevent Heart
Attack Trial (ALLHAT) participants assigned to doxazosin and chlorthalidone. Current controlled trials in
cardiovascular medicine, 3(1), 10.
[9] Jabbar M A, Deekshatulu B L & Chandra P (2013, March). Heart disease prediction using lazy associative
classification. In 2013 International Mutli-Conference on Automation, Computing,Communication, Control
and Compressed Sensing (iMac4s) (pp. 40- 6). IEEE.
[10] Dangare Chaitrali S and Sulabha S Apte. "Improved study of heart disease prediction system using data
mining classification techniques." International Journal of Computer Applications 47.10 (2012): 44-8.
Website
1. [Link]
2. [Link]
3. [Link]
4. [Link]
5. [Link]
6. [Link]
7. [Link]
8. [Link]
9. [Link]
10. [Link]