Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

algorithms-logo

Article Menu

literature review for heart disease prediction using machine learning

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Effective heart disease prediction using machine learning techniques.

literature review for heart disease prediction using machine learning

1. Introduction

2. literature survey, 3. methodology, 3.1. data source, 3.2. removing outliers, 3.3. feature selection and reduction, 3.4. clustering, 3.5. correlation table, 3.6. modeling, 3.6.1. decision tree classifier, 3.6.2. random forest, 3.6.3. multilayer perceptron, 3.6.4. xgboost, 5. conclusions, author contributions, conflicts of interest.

  • Estes, C.; Anstee, Q.M.; Arias-Loste, M.T.; Bantel, H.; Bellentani, S.; Caballeria, J.; Colombo, M.; Craxi, A.; Crespo, J.; Day, C.P.; et al. Modeling NAFLD disease burden in China, France, Germany, Italy, Japan, Spain, United Kingdom, and United States for the period 2016–2030. J. Hepatol. 2018 , 69 , 896–904. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Drożdż, K.; Nabrdalik, K.; Kwiendacz, H.; Hendel, M.; Olejarz, A.; Tomasik, A.; Bartman, W.; Nalepa, J.; Gumprecht, J.; Lip, G.Y.H. Risk factors for cardiovascular disease in patients with metabolic-associated fatty liver disease: A machine learning approach. Cardiovasc. Diabetol. 2022 , 21 , 240. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Murthy, H.S.N.; Meenakshi, M. Dimensionality reduction using neuro-genetic approach for early prediction of coronary heart disease. In Proceedings of the International Conference on Circuits, Communication, Control and Computing, Bangalore, India, 21–22 November 2014; pp. 329–332. [ Google Scholar ] [ CrossRef ]
  • Benjamin, E.J.; Muntner, P.; Alonso, A.; Bittencourt, M.S.; Callaway, C.W.; Carson, A.P.; Chamberlain, A.M.; Chang, A.R.; Cheng, S.; Das, S.R.; et al. Heart disease and stroke statistics—2019 update: A report from the American heart association. Circulation 2019 , 139 , e56–e528. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Shorewala, V. Early detection of coronary heart disease using ensemble techniques. Inform. Med. Unlocked 2021 , 26 , 100655. [ Google Scholar ] [ CrossRef ]
  • Mozaffarian, D.; Benjamin, E.J.; Go, A.S.; Arnett, D.K.; Blaha, M.J.; Cushman, M.; de Ferranti, S.; Després, J.-P.; Fullerton, H.J.; Howard, V.J.; et al. Heart disease and stroke statistics—2015 update: A report from the American Heart Association. Circulation 2015 , 131 , e29–e322. [ Google Scholar ] [ CrossRef ]
  • Maiga, J.; Hungilo, G.G.; Pranowo. Comparison of Machine Learning Models in Prediction of Cardiovascular Disease Using Health Record Data. In Proceedings of the 2019 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS), Jakarta, Indonesia, 24–25 October 2019; pp. 45–48. [ Google Scholar ] [ CrossRef ]
  • Li, J.; Loerbroks, A.; Bosma, H.; Angerer, P. Work stress and cardiovascular disease: A life course perspective. J. Occup. Health 2016 , 58 , 216–219. [ Google Scholar ] [ CrossRef ]
  • Purushottam; Saxena, K.; Sharma, R. Efficient Heart Disease Prediction System. Procedia Comput. Sci. 2016 , 85 , 962–969. [ Google Scholar ] [ CrossRef ]
  • Soni, J.; Ansari, U.; Sharma, D.; Soni, S. Predictive Data Mining for Medical Diagnosis: An Overview of Heart Disease Prediction. Int. J. Comput. Appl. 2011 , 17 , 43–48. [ Google Scholar ] [ CrossRef ]
  • Mohan, S.; Thirumalai, C.; Srivastava, G. Effective Heart Disease Prediction Using Hybrid Machine Learning Techniques. IEEE Access 2019 , 7 , 81542–81554. [ Google Scholar ] [ CrossRef ]
  • Waigi, R.; Choudhary, S.; Fulzele, P.; Mishra, G. Predicting the risk of heart disease using advanced machine learning approach. Eur. J. Mol. Clin. Med. 2020 , 7 , 1638–1645. [ Google Scholar ]
  • Breiman, L. Random forests. Mach. Learn. 2001 , 45 , 5–32. [ Google Scholar ] [ CrossRef ]
  • Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the KDD ’16: 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [ Google Scholar ] [ CrossRef ]
  • Gietzelt, M.; Wolf, K.-H.; Marschollek, M.; Haux, R. Performance comparison of accelerometer calibration algorithms based on 3D-ellipsoid fitting methods. Comput. Methods Programs Biomed. 2013 , 111 , 62–71. [ Google Scholar ] [ CrossRef ]
  • K, V.; Singaraju, J. Decision Support System for Congenital Heart Disease Diagnosis based on Signs and Symptoms using Neural Networks. Int. J. Comput. Appl. 2011 , 19 , 6–12. [ Google Scholar ] [ CrossRef ]
  • Narin, A.; Isler, Y.; Ozer, M. Early prediction of Paroxysmal Atrial Fibrillation using frequency domain measures of heart rate variability. In Proceedings of the 2016 Medical Technologies National Congress (TIPTEKNO), Antalya, Turkey, 27–29 October 2016. [ Google Scholar ] [ CrossRef ]
  • Shah, D.; Patel, S.; Bharti, S.K. Heart Disease Prediction using Machine Learning Techniques. SN Comput. Sci. 2020 , 1 , 345. [ Google Scholar ] [ CrossRef ]
  • Alotaibi, F.S. Implementation of Machine Learning Model to Predict Heart Failure Disease. Int. J. Adv. Comput. Sci. Appl. 2019 , 10 , 261–268. [ Google Scholar ] [ CrossRef ]
  • Hasan, N.; Bao, Y. Comparing different feature selection algorithms for cardiovascular disease prediction. Health Technol. 2020 , 11 , 49–62. [ Google Scholar ] [ CrossRef ]
  • Ouf, S.; ElSeddawy, A.I.B. A proposed paradigm for intelligent heart disease prediction system using data mining techniques. J. Southwest Jiaotong Univ. 2021 , 56 , 220–240. [ Google Scholar ] [ CrossRef ]
  • Khan, I.H.; Mondal, M.R.H. Data-Driven Diagnosis of Heart Disease. Int. J. Comput. Appl. 2020 , 176 , 46–54. [ Google Scholar ] [ CrossRef ]
  • Kaggle Cardiovascular Disease Dataset. Available online: https://www.kaggle.com/datasets/sulianova/cardiovascular-disease-dataset (accessed on 1 November 2022).
  • Han, J.A.; Kamber, M. Data Mining: Concepts and Techniques , 3rd ed.; Morgan Kaufmann Publishers: San Francisco, CA, USA, 2011. [ Google Scholar ]
  • Rivero, R.; Garcia, P. A Comparative Study of Discretization Techniques for Naive Bayes Classifiers. IEEE Trans. Knowl. Data Eng. 2009 , 21 , 674–688. [ Google Scholar ]
  • Khan, S.S.; Ning, H.; Wilkins, J.T.; Allen, N.; Carnethon, M.; Berry, J.D.; Sweis, R.N.; Lloyd-Jones, D.M. Association of body mass index with lifetime risk of cardiovascular disease and compression of morbidity. JAMA Cardiol. 2018 , 3 , 280–287. [ Google Scholar ] [ CrossRef ]
  • Kengne, A.-P.; Czernichow, S.; Huxley, R.; Grobbee, D.; Woodward, M.; Neal, B.; Zoungas, S.; Cooper, M.; Glasziou, P.; Hamet, P.; et al. Blood Pressure Variables and Cardiovascular Risk. Hypertension 2009 , 54 , 399–404. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Yu, D.; Zhao, Z.; Simmons, D. Interaction between Mean Arterial Pressure and HbA1c in Prediction of Cardiovascular Disease Hospitalisation: A Population-Based Case-Control Study. J. Diabetes Res. 2016 , 2016 , 8714745. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Huang, Z. A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining. DMKD 1997 , 3 , 34–39. [ Google Scholar ]
  • Maas, A.H.; Appelman, Y.E. Gender differences in coronary heart disease. Neth. Heart J. 2010 , 18 , 598–602. [ Google Scholar ] [ CrossRef ]
  • Bhunia, P.K.; Debnath, A.; Mondal, P.; D E, M.; Ganguly, K.; Rakshit, P. Heart Disease Prediction using Machine Learning. Int. J. Eng. Res. Technol. 2021 , 9 . [ Google Scholar ]
  • Mohanty, M.D.; Mohanty, M.N. Verbal sentiment analysis and detection using recurrent neural network. In Advanced Data Mining Tools and Methods for Social Computing ; Academic Press: Cambridge, MA, USA, 2022; pp. 85–106. [ Google Scholar ] [ CrossRef ]
  • Menzies, T.; Kocagüneli, E.; Minku, L.; Peters, F.; Turhan, B. Using Goals in Model-Based Reasoning. In Sharing Data and Models in Software Engineering ; Morgan Kaufmann: San Francisco, CA, USA, 2015; pp. 321–353. [ Google Scholar ] [ CrossRef ]
  • Fayez, M.; Kurnaz, S. Novel method for diagnosis diseases using advanced high-performance machine learning system. Appl. Nanosci. 2021 . [ Google Scholar ] [ CrossRef ]
  • Hassan, C.A.U.; Iqbal, J.; Irfan, R.; Hussain, S.; Algarni, A.D.; Bukhari, S.S.H.; Alturki, N.; Ullah, S.S. Effectively Predicting the Presence of Coronary Heart Disease Using Machine Learning Classifiers. Sensors 2022 , 22 , 7227. [ Google Scholar ] [ CrossRef ]
  • Subahi, A.F.; Khalaf, O.I.; Alotaibi, Y.; Natarajan, R.; Mahadev, N.; Ramesh, T. Modified Self-Adaptive Bayesian Algorithm for Smart Heart Disease Prediction in IoT System. Sustainability 2022 , 14 , 14208. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

AuthorsNovel ApproachBest AccuracyDataset
Shorewall, 2021 [ ]Stacking of KNN, random forest, and SVM outputs with logistic regression as the metaclassifier75.1% (stacked model)Kaggle cardiovascular disease dataset (70,000 patients, 12 attributes)
Maiga et al., 2019 [ ]-Random forest
-Naive Bayes
-Logistic regression
-KNN
70%Kaggle cardiovascular disease dataset (70,000 patients, 12 attributes)
Waigi at el., 2020 [ ]Decision tree72.77% (decision tree)Kaggle cardiovascular disease dataset (70,000 patients, 12 attributes)
Our and ElSeddawy, 2021 [ ]Repeated random with random forest89.01%(random forest classifier)UCI cardiovascular dataset (303 patients, 14 attributes)
Khan and Mondal, 2020 [ ]Holdout cross-validation with the neural network for Kaggle dataset71.82% (neural networks)Kaggle cardiovascular disease dataset (70,000 patients, 12 attributes)
Cross-validation method with logistic regression (solver: lbfgs) where k = 3072.72%Kaggle cardiovascular disease dataset 1 (462 patients, 12 attributes)
Cross-validation method with linear SVM where k = 1072.22%Kaggle cardiovascular disease dataset (70,000 patients, 12 attributes)
FeatureVariableMin and Max Values
AgeAgeMin: 10,798 and max: 23,713
HeightHeightMin: 55 and max: 250
WeightWeightMin: 10 and max: 200
GenderGender1: female, 2: male
Systolic blood pressureap_hiMin: −150 and max: 16,020
Diastolic blood pressureap_loMin: −70 and max: 11,000
CholesterolCholCategorical value = 1(min) to 3(max)
GlucoseGlucCategorical value = 1(min) to 3(max)
SmokingSmoke1: yes, 0: no
Alcohol intakeAlco1: yes, 0: no
Physical activityActive1: yes, 0: no
Presence or absence of cardiovascular diseaseCardio1: yes, 0: no
MAP ValuesCategory
≥70 and <801
≥80 and <902
≥100 and <1103
≥100 and <1104
≥110 and <1205
FeatureVariableMin and Max Values
Gendergender1: male, 2: female
AgeAgeCategorical values = 0(min) to 6(max)
BMIBMI_ClassCategorical values = 0(min) to 5(max)
Mean arterial pressureMAP_ClassCategorical values = 0(min) to 5(max)
CholesterolCholesterolCategorical values = 1(min) to 3(max)
GlucoseGlucCategorical values = 1(min) to 3(max)
SmokingSmoke1: yes, 0: no
Alcohol intakeAlco1: yes, 0: no
Physical activityActive1: yes, 0: no
Presence or absence of cardiovascular diseaseCardio1: yes, 0: no
ModelAccuracyPrecisionRecallF1-ScoreAUC
Without CVCVWithout CVCVWithout CVCVWithout CVCV
MLP86.9487.2889.0388.7082.9584.8585.8886.710.95
RF86.9287.0588.5289.4283.4683.4385.9186.320.95
DT86.5386.3790.1089.5881.1781.6185.4085.420.94
XGB87.0286.8789.6288.9382.1183.5786.3086.160.95
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Bhatt, C.M.; Patel, P.; Ghetia, T.; Mazzeo, P.L. Effective Heart Disease Prediction Using Machine Learning Techniques. Algorithms 2023 , 16 , 88. https://doi.org/10.3390/a16020088

Bhatt CM, Patel P, Ghetia T, Mazzeo PL. Effective Heart Disease Prediction Using Machine Learning Techniques. Algorithms . 2023; 16(2):88. https://doi.org/10.3390/a16020088

Bhatt, Chintan M., Parth Patel, Tarang Ghetia, and Pier Luigi Mazzeo. 2023. "Effective Heart Disease Prediction Using Machine Learning Techniques" Algorithms 16, no. 2: 88. https://doi.org/10.3390/a16020088

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 12 November 2020

Early and accurate detection and diagnosis of heart disease using intelligent computational model

  • Yar Muhammad 1 ,
  • Muhammad Tahir 1 ,
  • Maqsood Hayat 1 &
  • Kil To Chong 2  

Scientific Reports volume  10 , Article number:  19747 ( 2020 ) Cite this article

26k Accesses

77 Citations

Metrics details

  • Cardiovascular diseases
  • Computational biology and bioinformatics
  • Health care
  • Heart failure

Heart disease is a fatal human disease, rapidly increases globally in both developed and undeveloped countries and consequently, causes death. Normally, in this disease, the heart fails to supply a sufficient amount of blood to other parts of the body in order to accomplish their normal functionalities. Early and on-time diagnosing of this problem is very essential for preventing patients from more damage and saving their lives. Among the conventional invasive-based techniques, angiography is considered to be the most well-known technique for diagnosing heart problems but it has some limitations. On the other hand, the non-invasive based methods, like intelligent learning-based computational techniques are found more upright and effectual for the heart disease diagnosis. Here, an intelligent computational predictive system is introduced for the identification and diagnosis of cardiac disease. In this study, various machine learning classification algorithms are investigated. In order to remove irrelevant and noisy data from extracted feature space, four distinct feature selection algorithms are applied and the results of each feature selection algorithm along with classifiers are analyzed. Several performance metrics namely: accuracy, sensitivity, specificity, AUC, F1-score, MCC, and ROC curve are used to observe the effectiveness and strength of the developed model. The classification rates of the developed system are examined on both full and optimal feature spaces, consequently, the performance of the developed model is boosted in case of high variated optimal feature space. In addition, P-value and Chi-square are also computed for the ET classifier along with each feature selection technique. It is anticipated that the proposed system will be useful and helpful for the physician to diagnose heart disease accurately and effectively.

Similar content being viewed by others

literature review for heart disease prediction using machine learning

Segment anything in medical images

literature review for heart disease prediction using machine learning

Genome-wide meta-analyses of restless legs syndrome yield insights into genetic architecture, disease biology and risk prediction

literature review for heart disease prediction using machine learning

Body mass index stratification optimizes polygenic prediction of type 2 diabetes in cross-biobank analyses

Introduction.

Heart disease is considered one of the most perilous and life snatching chronic diseases all over the world. In heart disease, normally the heart fails to supply sufficient blood to other parts of the body to accomplish their normal functionality 1 . Heart failure occurs due to blockage and narrowing of coronary arteries. Coronary arteries are responsible for the supply of blood to the heart itself 2 . A recent survey reveals that the United States is the most affected country by heart disease where the ratio of heart disease patients is very high 3 . The most common symptoms of heart disease include physical body weakness, shortness of breath, feet swollen, and weariness with associated signs, etc. 4 . The risk of heart disease may be increased by the lifestyle of a person like smoking, unhealthy diet, high cholesterol level, high blood pressure, deficiency of exercise and fitness, etc. 5 . Heart disease has several types in which coronary artery disease (CAD) is the common one that can lead to chest pain, stroke, and heart attack. The other types of heart disease include heart rhythm problems, congestive heart failure, congenital heart disease (birth time heart disease), and cardiovascular disease (CVD). Initially, traditional investigation techniques were used for the identification of heart disease, however, they were found complex 6 . Owing to the non-availability of medical diagnosing tools and medical experts specifically in undeveloped countries, diagnosis and cure of heart disease are very complex 7 . However, the precise and appropriate diagnosis of heart disease is very imperative to prevent the patient from more damage 8 . Heart disease is a fatal disease that rapidly increases in both economically developed and undeveloped countries. According to a report generated by the World Health Organization (WHO), an average of 17.90 million humans died from CVD in 2016. This amount represents approximately 30% of all global deaths. According to a report, 0.2 million people die from heart disease annually in Pakistan. Every year, the number of victimizing people is rapidly increasing. European Society of Cardiology (ESC) has published a report in which 26.5 million adults were identified having heart disease and 3.8 million were identified each year. About 50–55% of heart disease patients die within the initial 1–3 years, and the cost of heart disease treatment is about 4% of the overall healthcare annual budget 9 .

Conventional invasive-based methods used for the diagnosis of heart disease which were based on the medical history of a patient, physical test results, and investigation of related symptoms by the doctors 10 . Among the conventional methods, angiography is considered one of the most precise technique for the identification of heart problems. Conversely, angiography has some drawbacks like high cost, various side effects, and strong technological knowledge 11 . Conventional methods often lead to imprecise diagnosis and take more time due to human mistakes. In addition, it is a very expensive and computational intensive approach for the diagnosis of disease and takes time in assessment 12 .

To overcome the issues in conventional invasive-based methods for the identification of heart disease, researchers attempted to develop different non-invasive smart healthcare systems based on predictive machine learning techniques namely: Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Naïve Bayes (NB), and Decision Tree (DT), etc. 13 . As a result, the death ratio of heart disease patients has been decreased 14 . In literature, the Cleveland heart disease dataset is extensively utilized by the researchers 15 , 16 .

In this regard, Robert et al . 17 have used a logistic regression classification algorithm for heart disease detection and obtained an accuracy of 77.1%. Similarly, Wankhade et al . 18 have used a multi-layer perceptron (MLP) classifier for heart disease diagnosis and attained accuracy of 80%. Likewise, Allahverdi et al . 19 have developed a heart disease classification system in which they integrated neural networks with an artificial neural network and attained an accuracy of 82.4%. In a sequel, Awang et al . 20 have used NB and DT for the diagnosis and prediction of heart disease and achieved reasonable results in terms of accuracy. They achieved an accuracy of 82.7% with NB and 80.4% with DT. Oyedodum and Olaniye 21 have proposed a three-phase system for the prediction of heart disease using ANN. Das and Turkoglu 22 have proposed an ANN ensemble-based predictive model for the prediction of heart disease. Similarly, Paul and Robin 23 have used the adaptive fuzzy ensemble method for the prediction of heart disease. Likewise, Tomov et al. 24 have introduced a deep neural network for heart disease prediction and his proposed model performed well and produced good outcomes. Further, Manogaran and Varatharajan 25 have introduced the concept of a hybrid recommendation system for diagnosing heart disease and their model has given considerable results. Alizadehsani et al . 26 have developed a non-invasive based model for the prediction of coronary artery disease and showed some good results regarding the accuracy and other performance assessment metrics. Amin et al . 27 have proposed a framework of a hybrid system for the identification of cardiac disease, using machine learning, and attained an accuracy of 86.0%. Similarly, Mohan et al . 28 have proposed another intelligent system that integrates RF with a linear model for the prediction of heart disease and achieved the classification accuracy of 88.7%. Likewise, Liaqat et al . 29 have developed an expert system that uses stacked SVM for the prediction of heart disease and obtained 91.11% classification accuracy on selected features.

The contribution of the current work is to introduce an intelligent medical decision system for the diagnosis of heart disease based on contemporary machine learning algorithms. In this study, 10 different nature of machine learning classification algorithms such as Logistic Regression (LR), Decision Tree (DT), Naïve Bayes (NB), Random Forest (RF), Artificial Neural Network (ANN), etc. are implemented in order to select the best model for timely and accurate detection of heart disease at an early stage. Four feature selection algorithms, Fast Correlation-Based Filter Solution (FCBF), minimal redundancy maximal relevance (mRMR), Least Absolute Shrinkage and Selection Operator (LASSO), and Relief have been used for selecting the vital and more correlated features that have truly reflect the motif of the desired target. Our developed system has been trained and tested on the Cleveland (S 1 ) and Hungarian (S 2 ) heart disease datasets which are available online on the UCI machine learning repository. All the processing and computations were performed using Anaconda IDE. Python has been used as a tool for implementing all the classifiers. The main packages and libraries used include pandas, NumPy, matplotlib, sci-kit learn (sklearn), and seaborn. The main contribution of our proposed work is given below:

The performance of all classifiers has been tested on full feature spaces in terms of all performance evaluation matrices specifically accuracy.

The performances of the classifiers are tested on selected feature spaces, selected through various feature selection algorithms mentioned above.

The research study recommends that which feature selection algorithm is feasible with which classification algorithm for developing a high-level intelligence system for the diagnosing of heart disease patients.

The rest of the paper is organized as: “ Results and discussion ” section represents the results and discussion, “ Material and methods ” section describes the material and methods used in this paper. Finally, we conclude our proposed research work in “ Conclusion ” section.

Results and discussion

This section of the paper discusses the experimental results of various contemporary classification algorithms. At first, the performance of all used classification models i.e. K-Nearest Neighbors (KNN), Decision Tree (DT), Extra-Tree Classifier (ETC), Random Forest (RF), Logistic Regression (LR), Naïve Bayes (NB), Artificial Neural Network (ANN), Support Vector Machine (SVM), Adaboost (AB), and Gradient Boosting (GB) along with full feature space is evaluated. After that, four feature selection algorithms (FSA): Fast Correlation-Based Filter (FCBF), Minimal Redundancy Maximal Relevance (mRMR), Least Absolute Shrinkage and Selection Operator (LASSO), and Relief are applied to select the prominent and high variant features from feature space. Furthermore, the selected feature spaces are provided to classification algorithms as input to analyze the significance of feature selection techniques. The cross-validation techniques i.e. k-fold (10-fold) are applied on both the full and selected feature spaces to analyze the generalization power of the proposed model. Various performance evaluation metrics are implemented for measuring the performances of the classification models.

Classifiers’ predictive outcomes on full feature space

The experimental outcomes of the applied classification algorithms on the full feature space of the two benchmark datasets by using 10-fold cross-validation (CV) techniques are shown in Tables 1 and 2 , respectively.

The experimental results demonstrated that the ET classifier performed quite well in terms of all performance evaluation metrics compared to the other classifiers using 10-fold CV. ET achieved 92.09% accuracy, 91.82% sensitivity, 92.38% specificity, 97.92% AUC, 92.84% Precision, 0.92 F1-Score and 0.84 MCC. The specificity indicates that the diagnosed test was negative and the individual doesn't have the disease. While the sensitivity indicates the diagnostic test was positive and the patient has heart disease. In the case of the KNN classification model, multiple experiments were accomplished by considering various values for k i.e. k = 3, 5, 7, 9, 13, and 15, respectively. Consequently, KNN has shown the best performance at value k = 7 and achieved a classification accuracy of 85.55%, 85.93% sensitivity, 85.17% specificity, 95.64% AUC, 86.09% Precision, 0.86 F1-Score, and 0.71 MCC. Similarly, DT classifier has achieved accuracy of 86.82%, 89.73% sensitivity, 83.76% specificity, 91.89% AUC, 85.40% Precision, 0.87 F1-Score, and 0.73 MCC. Likewise, GB classifier has yielded accuracy of 91.34%, 90.32% sensitivity, 91.52% specificity, 96.87% AUC, 92.14% Precision, 0.92 F1-Score, and 0.83 MCC. After empirically evaluating the success rates of all classifiers, it is observed that ET Classifier out-performed among all the used classification algorithms in terms of accuracy, sensitivity, and specificity. Whereas, NB shows the lowest performance in terms of accuracy, sensitivity, and specificity. The ROC curve of all classification algorithms on full feature space is represented in Fig.  1 .

figure 1

ROC curves of all classifiers on full feature space using 10-fold cross-validation on S 1 .

In the case of dataset S 2 , composed of 1025 total instances in which 525 belong to the positive class and 500 instances of having negative class, again ET has obtained quite well results compared to other classifiers using a 10-fold cross-validation test, which are 96.74% accuracy, 96.36 sensitivity, 97.40% specificity, and 0.93 MCC as shown in Table 2 .

Classifiers’ predictive outcomes on selected feature space

Fcbf feature selection technique.

FCBF feature selection technique is applied to select the best subset of feature space. In this attempt, various length of subspaces is generated and tested. Finally, the best results are achieved by classification algorithms on the subset of feature space (n = 6) using a 10-fold CV. Table 3 shows various performance measures of classifiers executed on the selected features space of FCBF.

Table 3 demonstrates that the ET classifier obtained quite good results including accuracy of 94.14%, 94.29% sensitivity, and specificity of 93.98%. In contrast, NB reported the lowest performance compared to the other classification algorithms. The performance of classification algorithms is also illustrated in Fig.  2 by using ROC curves.

figure 2

ROC curve of all classifiers on selected features by FCBF feature selection algorithm.

mRMR feature selection technique

mRMR feature selection technique is used in order to select a subset of features that enhance the performance of classifiers. The best results reported on a subset of n = 6 of feature space which is shown in Table 4 .

In the case of mRMR, still, the success rates of the ET classifier are well in terms of all performance evaluation metrics compared to the other classifiers. ET has attained 93.42% accuracy, 93.92% sensitivity, and specificity of 93.88%. In contrast, NB has achieved the lowest outcomes which are 81.84% accuracy. Figure  3 shows the ROC curve of all ten classifiers using the mRMR feature selection algorithm.

figure 3

ROC curve of all classifiers on selected features using the mRMR feature selection algorithm.

LASSO feature selection technique

In order to choose the optimal feature space which not only reduces computational cost but also progresses the performance of the classifiers, LASSO feature selection technique is applied. After performing various experiments on different subsets of feature space, the best results are still noted on the subspace of (n = 6). The predicted outcomes of the best-selected feature space are reported in Table 5 using the 10-fold CV.

Table 5 demonstrated that the predicted outcomes of the ET classifier are considerable and better compared to the other classifiers. ET has achieved 89.36% accuracy, 88.21% sensitivity, and specificity of 90.58%. Likewise, GB has yielded the second-best result which is the accuracy of 88.47%, 89.54% sensitivity, and specificity of 87.37%. Whereas, LR has performed worse results and achieved 80.77% accuracy, 83.46% sensitivity, and specificity of 77.95%. ROC curves of the classifiers are shown in Fig.  4 .

figure 4

ROC curve of all classifiers on selected feature space using the LASSO feature selection algorithm.

Relief feature selection technique

In a sequel, another feature selection technique Relief is applied to investigate the performance of classifiers on different sub-feature spaces by using the wrapper method. After empirically analyzing the results of the classifiers on a different subset of feature spaces, it is observed that the performance of classifiers is outstanding on the sub-space of length (n = 6). The results of the optimal feature space on the 10-fold CV technique are listed in Table 6 .

Again, the ET classifier performed outstandingly in terms of all performance evaluation metrics as compared to other classifiers. ET has obtained an accuracy of 94.41%, 94.93% sensitivity, and specificity of 94.89%. In contrast, NB has shown the lowest performance and achieved 80.29% accuracy, 81.93% sensitivity, and specificity of 78.55%. The ROC curves of the classifiers are demonstrated in Fig.  5 .

figure 5

ROC curve of all classifiers on selected features selected by the Relief feature selection algorithm.

After executing classification algorithms along with full and selected feature spaces in order to select the optimal algorithm for the operational engine, the empirical results have revealed that ET performed well not only on all feature space but also on optimal selected feature space among all the used classification algorithms. Furthermore, the ET classifier obtained quite promising accuracy in the case of the Relief feature selection technique which is 94.41%. Overall, the performance of ET is reported better in terms of most of the measures while other classifiers have shown good results in one measure while worse in other measures. In addition, the performance of the ET classifier is also evaluated on a 10-fold CV in combination with different sub-feature spaces of varying length starting from 1 to 12 with a step size of 1 to check the stability and discrimination power of the classifier as described in 30 . Doing so will assist the readers to have a better understanding of the impact, of the number of selected features on the performance of the classifiers. The same process is repeated for another dataset i.e. S 2 (Hungarian heart disease dataset) as well, to know the impact of selected features on the classification performance.

Tables 7 and 8 shows the performance of the ET classifier using 10-fold CV in combination with different feature sub-spaces starting from 1 to 12 with a step size of 1. The experimental results show that the performance of the ET classifier is affected significantly by using the varying length of sub-feature spaces. Finally, it is concluded that all these achievements are ascribed with the best selection of Relief feature selection technique which not only reduces the feature space but also enhances the predictive power of classifiers. In addition, the ET classifier has also played a quite promising role in these achievements because it has clearly and precisely learned the motif of the target class and reflected it truly. In addition, the performance of the ET classifier is also evaluated on 5-fold and 7-fold CV in combination with different sub-spaces of length 5 and 7 to check the stability and discrimination power of the classifier. It is also tested on another dataset S 2 (Hungarian heart disease dataset). The results are shown in supplementary materials .

In Table 9 , P-value and Chi-Square values are also computed for the ET classifier in combination with the optimal feature spaces of different feature selection techniques.

Performance comparison with existing models

Further, a comparative study of the developed system is conducted with other states of the art machine learning approaches discussed in the literature. Table 10 represents, a brief description and classification accuracies of those approaches. The results demonstrate that our proposed model success rate is high compared to existing models in the literature.

Material and methods

The subsections represent the materials and the methods that are used in this paper.

The first and rudimentary step of developing an intelligent computational model is to construct or develop a problem-related dataset that truly and effectively reflects the pattern of the target class. Well organized and problem-related dataset has a high influence on the performance of the computational model. Looking at the significance of the dataset, two datasets i.e. the Cleveland heart disease dataset S 1 and Hungarian heart disease dataset (S 2 ) are used, which are available online at the University of California Irvine (UCI) machine learning repository and UCI Kaggle repository, and various researchers have used it for conducting their research studies 28 , 31 , 32 . The S1 consists of 304 instances, where each instance has distinct 13 attributes along with the target labels and are selected for training. The dataset is composed of two classes, presence or absence of heart disease. The S 2 is composed of 1025 instances in which 525 instances belong to positive class while the rest of 500 instances have negative class. The description of attributes of both the datasets is the same, and both have similar attributes. The complete description and information of the datasets with 13 attributes are given in Table 11 .

Proposed system methodology

The main theme of the developed system is to identify heart problems in human beings. In this study, four distant feature selection techniques namely: FCBF, mRMR, Relief, and LASSO are applied on the provided dataset in order to remove noisy, redundant features and select variant features, consequently may cause of enhancing the performance of the proposed model. Various machine learning classification algorithms are used in this study which include, KNN, DT, ETC, RF, LR, NB, ANN, SVM, AB, and GB. Different evaluation metrics are computed to assess the performance of classification algorithms. The methodology of the proposed system is carried out in five stages which include dataset preprocessing, selection of features, cross-validation technique, classification algorithms, and performance evaluation of classifiers. The framework of the proposed system is illustrated in Fig.  6 .

figure 6

An Intelligent Hybrid Framework for the prediction of heart disease.

Preprocessing of data

Data preprocessing is the process of transforming raw data into meaningful patterns. It is very crucial for a good representation of data. Various preprocessing approaches such as missing values removal, standard scalar, and Min–Max scalar are used on the dataset in order to make it more effective for classification.

Feature selection algorithms

Feature selection technique selects the optimal features sub-space among all the features in a dataset. It is very crucial because sometimes, the classification performance degrades due to irrelevant features in the dataset. The feature selection technique improves the performance of classification algorithms and also reduces their execution time. In this research study, four feature selection techniques are used and are listed below:

Fast correlation-based filter (FCBF): FCBF feature selection algorithm follows a sequential search strategy. It first selects full features and then uses symmetric uncertainty for measuring the dependencies of the features on each other and how they affect the target output label. After this, it selects the most important features using the backward sequential search strategy. FCBF outperforms on high dimensional datasets. Table 12 shows the results of the selected features (n = 6) by using the FCBF feature selection algorithm. Each attribute is given a weight based on its importance. According to the FCBF feature selection technique, the most important features are THA and CPT as shown in Table 12 . The ranking that the FCBF gives to all the features of the dataset is shown in Fig.  7 .

Minimal redundancy maximal relevance (mRMR): mRMR uses the heuristic approach for selecting the most vital features that have minimum redundancy and maximum relevance. It selects those features which are useful and relevant to the target. As it follows a heuristic approach so, it checks one feature at a time and then computes its pairwise redundancy with the other features. The mRMR feature selection algorithm is not suitable for high domain feature problems 33 . The results of selected features by the mRMR feature selection algorithm (n = 6) are listed in Table 13 . In addition, among these attributes, PES and CPT have the highest score. Figure  7 describes the attributes ranking given by the mRMR feature selection algorithm to all attributes in the feature space.

figure 7

Features ranking by four feature selection algorithms (FCBF, LASSO, mRMR, Relief).

Least absolute shrinkage and selection operator (LASSO) LASSO selects features based on updating the absolute value of the features coefficient. In updating the features coefficient values, zero becoming values are removed from the features subset. LASSO outperforms with low feature coefficient values. The features having high coefficient values will be selected in the subset of features and the rest will be eliminated. Moreover, some irrelevant features with higher coefficient values may be selected and are included in the subset of features 30 . Table 14 represents the six most profound attributes which have a great correlation with the target and their scores selected by the LASSO feature selection algorithm. Figure 7 represents the important features and their scoring values given by the LASSO feature selection algorithm.

Relief feature selection algorithm Relief utilizes the concept of instance-based learning which allocates weight to each attribute based on its significance. The weight of each attribute demonstrates its capability to differentiate among class values. Attributes are rated by weights, and those attributes whose weight is exceeding a user-specified cutoff, are chosen as the final subset 34 . The relief feature selection algorithm selects the most significant attributes which have more effect on the target 35 . The algorithm operates by selecting instances randomly from the training samples. The nearest instance of the same class (nearest hit) and opposite class (nearest miss) is identified for each sampled instance. The weight of an attribute is updated according to how well its values differentiate between the sampled instance and its nearest miss and hit. If an attribute discriminates amongst instances from different classes and has the same value for instances of the same class, it will get a high weight.

figure a

The weight updating of attributes works on a simple idea (line 6). That if instance R i and NH have dissimilar value (i.e. the diff value is large), that means the attribute splits two instances with the same class which is not worthwhile, and thus we reduce the attributes weight. On the other hand, if the instance R i and NM have a distinct value that means the attribute separates the two instances with a different class, which is desirable. The six most important features selected by the Relief algorithm are listed in descending order in Table 15 . Based on weight values the most vital features are CPT and Age. Figure  7 demonstrates the important features and their ranking given by the Relief feature selection algorithm.

Machine learning classification algorithms

Various machine learning classification algorithms are investigated for early detection of heart disease, in this study. Each classification algorithm has its significance and the importance is reported varied from application to application. In this paper, 10 distant nature of classification algorithms namely: KNN, DT, ET, GB, RF, SVM, AB, NB, LR, and ANN are applied to select the best and generalize prediction model.

Classifier validation method

Validation of the prediction model is an essential step in machine learning processes. In this paper, the K-Fold cross-validation method is applied to validating the results of the above-mentioned classification models.

K-fold cross validation (CV)

In K-Fold CV, the whole dataset is split into k equal parts. The (k-1) parts are utilized for training and the rest is used for the testing at each iteration. This process continues for k-iteration. Various researchers have used different values of k for CV. Here k = 10 is used for experimental work because it produces good results. In tenfold CV, 90% of data is utilized for training the model and the remaining 10% of data is used for the testing of the model at each iteration. At last, the mean of the results of each step is taken which is the final result.

Performance evaluation metrics

For measuring the performance of the classification algorithms used in this paper, various evaluation matrices have been implemented including accuracy, sensitivity, specificity, f1-score, recall, Mathew Correlation-coefficient (MCC), AUC-score, and ROC curve. All these measures are calculated from the confusion matrix described in Table 16 .

In confusion matrix True Negative (TN) shows that the patient has not heart disease and the model also predicts the same i.e. a healthy person is correctly classified by the model.

True Positive (TP) represents that the patient has heart disease and the model also predicts the same result i.e. a person having heart disease is correctly classified by the model.

False Positive (FP) demonstrates that the patient has not heart disease but the model predicted that the patient has i.e. a healthy person is incorrectly classified by the model. This is also called a type-1 error.

False Negative (FN) notifies that the patient has heart disease but the model predicted that the patient has not i.e. a person having heart disease is incorrectly classified by the model. This is also called a type-2 error.

Accuracy Accuracy of the classification model shows the overall performance of the model and can be calculated by the formula given below:

Specificity specificity is a ratio of the recently classified healthy people to the total number of healthy people. It means the prediction is negative and the person is healthy. The formula for calculating specificity is given as follows:

Sensitivity Sensitivity is the ratio of recently classified heart patients to the total patients having heart disease. It means the model prediction is positive and the person has heart disease. The formula for calculating sensitivity is given below:

Precision: Precision is the ratio of the actual positive score and the positive score predicted by the classification model/algorithm. Precision can be calculated by the following formula:

F1-score F1 is the weighted measure of both recall precision and sensitivity. Its value ranges between 0 and 1. If its value is one then it means the good performance of the classification algorithm and if its value is 0 then it means the bad performance of the classification algorithm.

MCC It is a correlation coefficient between the actual and predicted results. MCC gives resulting values between − 1 and + 1. Where − 1 represents the completely wrong prediction of the classifier.0 means that the classifier generates random prediction and + 1 represents the ideal prediction of the classification models. The formula for calculating MCC values is given below:

Finally, we will examine the predictability of the machine learning classification algorithms with the help of the receiver optimistic curve (ROC) which represents a graphical demonstration of the performance of ML classifiers. The area under the curve (AUC) describes the ROC of a classifier and the performance of the classification algorithms is directly linked with AUC i.e. larger the value of AUC greater will be the performance of the classification algorithm.

In this study, 10 different machine learning classification algorithms namely: LR, DT, NB, RF, ANN, KNN, GB, SVM, AB, and ET are implemented in order to select the best model for early and accurate detection of heart disease. Four feature selection algorithms such as FCBF, mRMR, LASSO, and Relief have been used to select the most vital and correlated features that truly reflect the motif of the desired target. Our developed intelligent computational model has been trained and tested on two datasets i.e. Cleveland (S1) and Hungarian (S2) heart disease datasets. Python has been used as a tool for implementation and simulating the results of all the utilized classification algorithms.

The performance of all classification models has been tested in terms of various performance metrics on full feature space as well as selected feature spaces, selected through various feature selection algorithms. This research study recommends that which feature selection algorithm is feasible with which classification model for developing a high-level intelligent system for the diagnosis of a patient having heart disease. From simulation results, it is observed that ET is the best classifier while relief is the optimal feature selection algorithm. In addition, P-value and Chi-square are also computed for the ET classifier along with each feature selection algorithm. It is anticipated that the proposed system will be useful and helpful for the doctors and other care-givers to diagnose a patient having heart disease accurately and effectively at the early stages.

Heart disease is one of the most devastating and fatal chronic diseases that rapidly increase in both economically developed and undeveloped countries and causes death. This damage can be reduced considerably if the patient is diagnosed in the early stages and proper treatment is provided to her. In this paper, we developed an intelligent predictive system based on contemporary machine learning algorithms for the prediction and diagnosis of heart disease. The developed system was checked on two datasets i.e. Cleveland (S1) and Hungarian (S2) heart disease datasets. The developed system was trained and tested on full features and optimal features as well. Ten classification algorithms including, KNN, DT, RF, NB, SVM, AB, ET, GB, LR, and ANN, and four feature selection algorithms such as FCBF, mRMR, LASSO, and Relief are used. The feature selection algorithm selects the most significant features from the feature space, which not only reduces the classification errors but also shrink the feature space. To assess the performance of classification algorithms various performance evaluation metrics were used such as accuracy, sensitivity, specificity, AUC, F1-score, MCC, and ROC curve. The classification accuracies of the top two classification algorithms i.e. ET and GB on full features were 92.09% and 91.34% respectively. After applying feature selection algorithms, the classification accuracy of ET with the relief feature selection algorithm increases from 92.09 to 94.41%. The accuracy of GB increases from 91.34 to 93.36% with the FCBF feature selection algorithm. So, the ET classifier with the relief feature selection algorithm performs excellently. P-value and Chi-square are also computed for the ET classifier with each feature selection technique. The future work of this research study is to use more optimization techniques, feature selection algorithms, and classification algorithms to improve the performance of the predictive system for the diagnosis of heart disease.

Bui, A. L., Horwich, T. B. & Fonarow, G. C. Epidemiology and risk profile of heart failure. Nat. Rev. Cardiol. 8 , 30 (2011).

Article   PubMed   Google Scholar  

Polat, K. & Güneş, S. Artificial immune recognition system with fuzzy resource allocation mechanism classifier, principal component analysis, and FFT method based new hybrid automated identification system for classification of EEG signals. Expert Syst. Appl. 34 , 2039–2048 (2010).

Article   Google Scholar  

Heidenreich, P. A. et al. Forecasting the future of cardiovascular disease in the United States: A policy statement from the American Heart Association. Circulation 123 , 933–944 (2011).

Durairaj, M. & Ramasamy, N. A comparison of the perceptive approaches for preprocessing the data set for predicting fertility success rate. Int. J. Control Theory Appl. 9 , 255–260 (2016).

Google Scholar  

Das, R., Turkoglu, I. & Sengur, A. Effective diagnosis of heart disease through neural networks ensembles. Expert Syst. Appl. 36 , 7675–7680 (2012).

Allen, L. A. et al. Decision making in advanced heart failure: A scientific statement from the American Heart Association. Circulation 125 , 1928–1952 (2014).

Yang, H. & Garibaldi, J. M. A hybrid model for automatic identification of risk factors for heart disease. J. Biomed. Inform. 58 , S171–S182 (2015).

Article   PubMed   PubMed Central   Google Scholar  

Alizadehsani, R., Hosseini, M. J., Sani, Z. A., Ghandeharioun, A. & Boghrati, R. In 2012 IEEE 12th International Conference on Data Mining Workshops. 9–16 (IEEE, New York).

Arabasadi, Z., Alizadehsani, R., Roshanzamir, M., Moosaei, H. & Yarifard, A. A. Computer aided decision making for heart disease detection using hybrid neural network-Genetic algorithm. Comput. Methods Programs Biomed. 141 , 19–26 (2017).

Samuel, O. W., Asogbon, G. M., Sangaiah, A. K., Fang, P. & Li, G. An integrated decision support system based on ANN and Fuzzy_AHP for heart failure risk prediction. Expert Syst. Appl. 68 , 163–172 (2017).

Patil, S. B. & Kumaraswamy, Y. Intelligent and effective heart attack prediction system using data mining and artificial neural network. Eur. J. Sci. Res. 31 , 642–656 (2009).

Vanisree, K. & Singaraju, J. Decision support system for congenital heart disease diagnosis based on signs and symptoms using neural networks. Int. J. Comput. Appl. 19 , 6–12 (2015).

B. Edmonds. In Proceedings of AISB Symposium on Socially Inspired Computing 1–12 (Hatfield, 2005).

Methaila, A., Kansal, P., Arya, H. & Kumar, P. Early heart disease prediction using data mining techniques. Comput. Sci. Inf. Technol. J. https://doi.org/10.5121/csit.2014.4807 (2014).

Samuel, O. W., Asogbon, G. M., Sangaiah, A. K., Fang, P. & Li, G. An integrated decision support system based on ANN and Fuzzy_AHP for heart failure risk prediction. Expert Syst. Appl. 68 , 163–172 (2018).

Nazir, S., Shahzad, S., Mahfooz, S. & Nazir, M. Fuzzy logic based decision support system for component security evaluation. Int. Arab J. Inf. Technol. 15 , 224–231 (2018).

Detrano, R. et al. International application of a new probability algorithm for the diagnosis of coronary artery disease. Am. J. Cardiol. 64 , 304–310 (2009).

Gudadhe, M., Wankhade, K. & Dongre, S. In 2010 International Conference on Computer and Communication Technology (ICCCT) , 741–745 (IEEE, New York).

Kahramanli, H. & Allahverdi, N. Design of a hybrid system for the diabetes and heart diseases. Expert Syst. Appl. 35 , 82–89 (2013).

Palaniappan, S. & Awang, R. In 2012 IEEE/ACS International Conference on Computer Systems and Applications 108–115 (IEEE, New York).

Olaniyi, E. O., Oyedotun, O. K. & Adnan, K. Heart diseases diagnosis using neural networks arbitration. Int. J. Intel. Syst. Appl. 7 , 72 (2015).

Das, R., Turkoglu, I. & Sengur, A. Effective diagnosis of heart disease through neural networks ensembles. Expert Syst. Appl. 36 , 7675–7680 (2011).

Paul, A. K., Shill, P. C., Rabin, M. R. I. & Murase, K. Adaptive weighted fuzzy rule-based system for the risk level assessment of heart disease. Applied Intelligence 48 , 1739–1756 (2018).

Tomov, N.-S. & Tomov, S. On deep neural networks for detecting heart disease. arXiv:1808.07168 (2018).

Manogaran, G., Varatharajan, R. & Priyan, M. Hybrid recommendation system for heart disease diagnosis based on multiple kernel learning with adaptive neuro-fuzzy inference system. Multimedia Tools Appl. 77 , 4379–4399 (2018).

Alizadehsani, R. et al. Non-invasive detection of coronary artery disease in high-risk patients based on the stenosis prediction of separate coronary arteries. Comput. Methods Programs Biomed. 162 , 119–127 (2018).

Haq, A. U., Li, J. P., Memon, M. H., Nazir, S. & Sun, R. A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms. Mobile Inf. Syst. 2018 , 3860146. https://doi.org/10.1155/2018/3860146 (2018).

Mohan, S., Thirumalai, C. & Srivastava, G. Effective heart disease prediction using hybrid machine learning techniques. IEEE Access 7 , 81542–81554 (2019).

Ali, L. et al. An optimized stacked support vector machines based expert system for the effective prediction of heart failure. IEEE Access 7 , 54007–54014 (2019).

Peng, H., Long, F. & Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27 (8), 1226–1238 (2005).

Palaniappan, S. & Awang, R. In 2008 IEEE/ACS International Conference on Computer Systems and Applications 108–115 (IEEE, New York).

Ali, L., Niamat, A., Golilarz, N. A., Ali, A. & Xingzhong, X. An expert system based on optimized stacked support vector machines for effective diagnosis of heart disease. IEEE Access (2019).

Pérez, N. P., López, M. A. G., Silva, A. & Ramos, I. Improving the Mann-Whitney statistical test for feature selection: An approach in breast cancer diagnosis on mammography. Artif. Intell. Med. 63 , 19–31 (2015).

Tibshirani, R. Regression shrinkage and selection via the lasso: A retrospective. J. R. Stat. Soc. Ser. B Stat. Methodol. 73 , 273–282 (2011).

Article   MathSciNet   Google Scholar  

Peng, H., Long, F. & Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27 , 1226–1238 (2012).

de Silva, A. M. & Leong, P. H. Grammar-Based Feature Generation for Time-Series Prediction (Springer, Berlin, 2015).

Book   Google Scholar  

Download references

Acknowledgements

This research was supported by the Brain Research Program of the National Research Foundation (NRF) funded by the Korean government (MSIT) (No. NRF-2017M3C7A1044815).

Author information

Authors and affiliations.

Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan

Yar Muhammad, Muhammad Tahir & Maqsood Hayat

Department of Electronic and Information Engineering, Jeonbuk National University, Jeonju, 54896, South Korea

Kil To Chong

You can also search for this author in PubMed   Google Scholar

Contributions

All authors have equal contributions.

Corresponding authors

Correspondence to Maqsood Hayat or Kil To Chong .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Muhammad, Y., Tahir, M., Hayat, M. et al. Early and accurate detection and diagnosis of heart disease using intelligent computational model. Sci Rep 10 , 19747 (2020). https://doi.org/10.1038/s41598-020-76635-9

Download citation

Received : 03 April 2020

Accepted : 28 October 2020

Published : 12 November 2020

DOI : https://doi.org/10.1038/s41598-020-76635-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Comprehensive evaluation and performance analysis of machine learning in heart disease prediction.

  • Halah A. Al-Alshaikh
  • Abeer A. AlSanad

Scientific Reports (2024)

Future prediction for precautionary measures associated with heart-related issues based on IoT prototype

  • Ganesh Keshaorao Yenurkar
  • Aniket Pathade

Multimedia Tools and Applications (2024)

An improved machine learning-based prediction framework for early detection of events in heart failure patients using mHealth

  • Deepak Kumar
  • Keerthiveena Balraj
  • Anurag S. Rathore

Health and Technology (2024)

Identification and classification of pneumonia disease using a deep learning-based intelligent computational framework

  • Lanying Tang

Neural Computing and Applications (2023)

Back propagation artificial neural network for diagnose of the heart disease

  • Jagmohan Kaur
  • Baljit S. Khehra
  • Amarinder Singh

Journal of Reliable Intelligent Environments (2023)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

literature review for heart disease prediction using machine learning

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Future Sci OA
  • v.7(6); 2021 Jul

Logo of fsoa

Machine learning algorithms for predicting coronary artery disease: efforts toward an open source solution

Aravind akella.

1 Qualicel Global Inc., Huntington Station, NY 11746, USA

Sudheer Akella

Associated data.

The development of coronary artery disease (CAD), a highly prevalent disease worldwide, is influenced by several modifiable risk factors. Predictive models built using machine learning (ML) algorithms may assist clinicians in timely detection of CAD and may improve outcomes.

Materials & methods:

In this study, we applied six different ML algorithms to predict the presence of CAD amongst patients listed in ‘the Cleveland dataset.’ The generated computer code is provided as a working open source solution with the ultimate goal to achieve a viable clinical tool for CAD detection.

All six ML algorithms achieved accuracies greater than 80%, with the ‘neural network’ algorithm achieving accuracy greater than 93%. The recall achieved with the ‘neural network’ model is also the highest of the six models (0.93), indicating that predictive ML models may provide diagnostic value in CAD.

Lay abstract

Coronary artery disease (CAD) is correlated with many preventable risk factors. Early diagnosis of CAD allows for prevention of worsening of CAD and its complications. This study aims to utilize machine learning (ML) algorithms to predict for CAD in patients. Our results indicate that ML algorithms can accurately predict for CAD. Furthermore, by providing our code publicly, we hope to improve the ability for ML algorithms as a diagnostic tool for CAD.

Coronary artery disease (CAD) is the most common type of heart disease, affecting millions worldwide. According to recent statistics from the American Heart Association, coronary heart disease accounted for 13% of deaths in the USA in 2018 [ 21 ]. Worldwide in 2015, CAD was found to be one of the most common causes of death, with 15.6% of all deaths resulting from the disease [ 1 ]. Because CAD is associated with several modifiable risk factors pertaining to lifestyle and intervention, timing of detection and diagnostic accuracy are especially relevant in the clinical management of patients with CAD.

Over the past several years, approaches that include machine learning (ML) are making significant impact in the detection and diagnosis of diseases [ 2–7 ]. In general, the ML approach involves ‘training’ an algorithm with a control dataset for which the disease status (disease or no disease) is known, and then applying this trained algorithm to a variable dataset in order to predict the disease status in patients for whom it is not yet determined. As larger cohorts of data are introduced, the ML algorithm will be better trained as a predictor for disease status. More accurate disease prediction with ML would empower clinicians with improved detection, diagnosis, classification, risk stratification and ultimately, management of patients, all while potentially minimizing required clinical intervention.

The application of ML concepts to CAD has been significantly hampered by the availability of appropriate clinical datasets. However, one of the components of the ‘UCI Heart Disease Dataset,’ dubbed as the ‘Cleveland Dataset,’ is publicly available on the UCI Machine Learning Repository (see [ 22 ]). Originally intended to be a teaching aid, the Cleveland dataset has been highly exploited for exploring ML concepts. Available since 1988, the Cleveland dataset has so far received more than one million downloads and is currently ranked as the fifth most popular dataset in the UCI repository. A GoogleScholar search with the terms ‘Cleveland dataset,’ ‘heart disease’ and ‘machine learning,’ returns little more than 300 records available since 2010, of which most studies analyzed one ML method at a time (for the latest survey review, see [ 8 ]). In addition, there is no way to verify claims in any of the publications for the accuracy of the algorithms, as the computer code has not been made publicly available [ 23 ]. A PubMed search using the same keywords identified ten original articles and one comprehensive review article [ 9 ]. Of the ten original articles, three used the Cleveland dataset, whereas the remaining seven used proprietary datasets. Of the ten, none of the studies have made the computer code publicly available. In our view, all recent studies pertaining to the application of the ML approach to CAD thus far appear exploratory rather than seeking to provide clinical assistance to healthcare practitioners in the treatment of CAD. (Also see ‘Related Works’ in the Supplementary materials files for a list of recently published articles that encompass ML algorithms with detection and diagnosis of disease.)

We have now undertaken a comparative analysis by applying six different ML algorithms (models) using the UCI Cleveland dataset to predict disease outcomes. In an effort to initiate an open source ML solution for detecting CAD, we have deposited our computer code on GitHub (see [ 25 ]), making it available for other researchers to test and improve our work. We also welcome the opportunity to gain access to larger datasets to further our efforts toward an open source solution. As for the six ML algorithms used in our study, we have found that all six (linear regression, regression tree, random forest, support-vector machine, nearest neighbor, neural network and k-nearest neighbor) perform well with an accuracy of greater than ∼80%, with the nearest neighbor's accuracy greater than 93%. We consider Accuracy, Recall, F1 score and the Area Under the Curve–Receiver Operating (AUC-ROC) as performance metrics for comparative analysis amongst the ML models.

Numerous risk factor variables contribute to the development of CAD, some of which can be controlled (or modifiable) (e.g., see [ 10 ]). These include high blood pressure, high cholesterol, smoking, diabetes, obesity, lack of physical activity, unhealthy diet and stress. The risk factors that cannot be controlled (or nonmodifiable) are age, sex (gender), family history and race or ethnicity. Traditional approaches assess these risk factors to predict future risk (prognosis) of cardiovascular disease [ 11 ]. However, a large number of individuals at risk of cardiovascular disease fail to be identified by these approaches, while some individuals not at risk are given preventative treatment unnecessarily (see [ 12 , 13 ]). Several of the ML algorithms have the ability to summarize the impact of individual variables on response variable and are referred to as ‘variables of importance’ (from the model's perspective), thus aiding to the building of accurate prognostic models [ 14 ]. In the present study, we extract the variables of importance in one model in an effort to demonstrate the feasibility of including a risk assessment component in the ML model. Furthermore, we have openly hosted the computer code that was generated in this study on GitHub, so that we may provide the settings of a workflow model that promotes public contribution toward improved risk detection of CAD using ML, as others may contribute by extending the code to larger cohort groups.

The remainder of the manuscript is organized as follows. The Dataset & preprocessing section provides information on the data source, describes the data variables, explains the data preprocessing steps and provides a high-level analysis of the data. The next section provides concise descriptions of the ML models considered in this study and the methodologies of how the model results are evaluated. The results of the modeling efforts are presented and discussed in the Results & discussion section. The concluding section outlines the limitations of the present modeling efforts and provides a workflow model framework for extending our study in future.

Dataset & preprocessing

The dataset used in this study is downloaded from the repository maintained by the UCI (University of California, CA, USA) Center for Machine Learning and Intelligent Systems. The repository contains four datasets from four different hospitals. The Cleveland dataset contains fewer missing attributes than the other datasets and has more records. This dataset has fourteen variables on 303 patients. Table 1 lists all fourteen variables in the dataset, the associated datatype for each and a brief description.

Table 1. 

VariableTypeDescription
AgeContinuousPatient age in years
SexCategoricalPatient gender (1 = male; 0 = female)
cpCategoricalChest pain (1 = typical angina; 2 = atypical angina; 3 = nonanginal pain; 4 = no pain)
trestbpsContinuousResting blood pressure (in mmHg) on admission to the hospital
cholContinuousSerum cholesterol in mg/dl
fbsCategoricalFasting blood sugar higher than 120 mg/dl (1 = true; 0 = false)
restecgCategoricalResting electrocardigram (0 = normal; 1 = ST-T wave abnormality; 2 = probable/definite left ventricular hypertrophy)
thalachContinuousMaximum heart rate achieved (during thallium test)
exangCategoricalExercise induced angina (1 = yes; 0 = no)
oldpeakContinuousST depression induced by exercise relative to rest
slopeCategoricalSlope of the peak exercise ST segment (1 = up-sloping; 2 = flat; 3 = down-sloping)
caCategoricalNumber of major vessels (0 to 3) colored by fluoroscopy
thalCategoricalThallium heart scan (3 = normal; 6 = fixed defect; 7 = reversible defect)
numCategoricalDiagnosis of heart disease (angiographic disease status) (0 = absent; 1 to 4 = present)

During the data preprocessing steps, six rows of data with unknown values were dropped resulting in a dataset of 297 observations; two dataset columns of factor datatype were converted to numeric datatype; continuous variables (four, excluding age) were normalized; the column ‘num’ was renamed to ‘hd’ for clarity; and the values 1 to 4 in the heart disease (hd) column are replaced with value 1 to form a binary classification – disease (value of 1) or no disease (value of 0).

The correlation matrix for the 14 variables in the dataset is shown in Figure 1 , depicting the Pearson coefficients, or r, corresponding to the association between two variables, which can range from -1 to +1. A higher absolute value of r indicates a stronger correlation between two variables. A positive r indicates a direct relationship between two variables, whereas a negative r suggests an indirect relationship. In this analysis, we considered an absolute value of 0.5 as threshold, that is, if r is greater than 0.5 or less than -0.5, we assume those two variables are correlated. The above correlation matrix shows that only a few variables correlate with coefficients greater than 0.5, demonstrating poor correlation between the variables. Having a low value of r indicates that all these independent variables and can be included in the machine learning model building.

An external file that holds a picture, illustration, etc.
Object name is fsoa-07-698-g1.jpg

The color coding scale denotes the degree of Pearson correlation between variables with red being negatively correlated and blue positively correlated.

Additional analysis of the dataset is presented in the Supplementary material.

Machine learning models, model building & model evaluation

In applying machine learning models, it is generally understood that no single algorithm is superior to the others [ 15 ]. In machine learning, if every instance in the dataset is given to the model with known labels (the corresponding correct outputs), like in the Cleveland dataset, then the learning is called ‘supervised’, in contrast to ‘unsupervised’ learning, in which instances are unlabeled. Below, we present the general idea on how each of the six supervised machine learning algorithms work on the dataset and any assumptions we make in each case. Each algorithm is first trained (or fitted) with a fraction of the dataset, usually known as the ‘training set’ and then tested on the ‘test set' that is put aside as ‘unseen data’ for evaluating the algorithm. For a detailed description of the models we refer the reader to the excellent treatise by James et al. [ 16 ].

Logistic regression

In Logistic regression, each independent variable in every instance of the dataset is multiplied by a weight and the cumulative result is passed to a sigmoid function. The sigmoid function maps real values into probability values between 0 and 1. In our present modeling, we left the threshold to the default value, which is 0.5, such that for probability values greater than 0.5, the model predicts the dependent variable to be 1 (the patient has CAD) and for values less than or equal to 0.5, the model predicts the dependent variable to be 0 (the patient does not have CAD).

Decision tree

Decision tree is a tree-like structure that classifies instances by sorting them based on the values of the variables. Each node in a decision tree represents a variable, and each branch represents a value that the node can assume. Instances are classified starting at the root node and sorted based on the values of the variables. The variable that best divides the dataset would be the root node of the tree. Internal nodes (or split nodes) are the decision-making part that make a decision, based on multiple algorithms and to visit subsequent nodes. The split process is terminated when a user-defined criteria is reached at the leaf (for the present modeling, we left it to be the default value, which is 20). The paths from root nodes to the leaf nodes represent classification rules.

Random forest

Random forest is an ensemble model consisting of multiple regression trees like in a forest. Random forest combines several classification trees, trains each one on a slightly different set of the dataset instances, splitting nodes in each tree considering a limited number of the variables. The final predictions of the random forest are made by averaging the predictions of each individual tree, which enhances the prediction accuracy for unseen data. The number of regression trees chosen for present modeling is (ntree) 500.

Support vector machine

In support-vector machine, each data point is plotted in an n-dimensional space with the value of each variable being the value of particular coordinates and classification is performed based on the hyperplane that differentiates the two data classes. Following this, characteristics of new instances can be used to predict the class to which a new instance should belong.

k-Nearest neighbor

k-Nearest neighbor (kNN) is one of the most basic and nonparametric algorithms, it does not make any assumptions about the distribution of the underlying data. The algorithm is based on the principle of Euclidean distance that is the instances within a dataset generally exist in close proximity to other instances that have similar properties. If the instances are tagged with a classification label, then the value of the label of an unclassified instance can be determined by observing the class of its nearest neighbors. For the present modeling, the whole process is repeated three times (repeats = 3) each with a k value of 10 (number = 10) and taking the average of the three iterations.

Artificial neural network

An artificial neuron Network (ANN) is based on the structure and functions of biological neural networks – the network learns (or changes) based on the input and output. The layers in ANN are segregated into three classes: input units, which receive information to be processed; output units, where the results of the processing are found; and units in between known as hidden units. The network is first trained on a dataset of paired data to determine input-output mapping. After training, the weights of the connections between neurons are then fixed and the network is used to determine the classifications of a new set of data. For the present modeling we consider the hidden units to be three (hidden = 3) and threshold to be 0.05 (threshold = 0.05).

To overcome our inability of using real-world data, we split the dataset into a ‘training set’ (70%, i.e., 208 observations) and a ‘test set’ (30%, i.e., 87 observations) making sure to balance the class distributions within the split (see the associated computer code on GitHub, https://github.com/aa54/CAD_1 ). The ‘training’ dataset is used to train the model; the model sees and learns from this training data. The ‘test’ dataset is then used to provide an unbiased evaluation of a final model fit to the training dataset. In some cases, we ran multiple experiments to validate model results on different splitting ratios.

The following four metrics [ 17 ] were used to evaluate the performance of the predicted models and compare them with one another.

1) Accuracy: the proportion of total dataset instances that were correctly predicted out of the total instances

2) Recall (sensitivity): the proportion of the predicted positive dataset instances out of the actual positive instances

3) F1 score: a composite harmonic mean (average of reciprocals) that combines both precision and recall. For this, we first measure the precision, the ability of the model to identify only the relevant dataset instances

The F1 score is estimated as

4) Area under the curve (AUC): an estimate of the probability that a model ranks a randomly chosen positive instance higher than a randomly chosen negative instance. Additionally, the full area under the curve–receiver operating characteristic (AUC-ROC) curves are plotted to visually compare the models' performance.

Results & discussion

For our machine learning model building, we started with two basic models, logistic regression and decision tree, to predict the presence of CAD. Our intuition was that it would be easy to interpret the results of the basic models and to explain each model to the nonmachine learning audience. After analyzing the results, however, we realized that the decision tree model is prone to over-fitting, and logistic regression did not perform relatively well with this simple dataset. We then considered applying the standard SMOTE methodology [ 18 ] to create additional synthetic data. Although we achieved more accurate results with SMOTE (5–6% higher accuracy), we felt that SMOTE is not necessarily a good approach as it creates data points based on the distance algorithm, and these synthetic datapoints may not be ‘true’ representations of patients. Therefore, we decided not to perform SMOTE on the dataset and proceeded to apply the remaining, more complex ML algorithms.

As shown in Table 2 , an accuracy value higher than 0.84 is achieved with all but the decision tree model (just below 0.80). The other performance parameters, recall, F1 score and AUC are also high. A mean value is calculated with all the later three parameters for each of the models (shown in the last column in Table 2 ) to judge which model performs best as a whole. We excluded accuracy in calculating the mean, as accuracy is often considered to be a misleading indicator in measuring the performance of models with biomedical datasets (e.g., see [ 3 ]).

Table 2. 

ModelAccuracySensitivityF1 scoreAUCMean
Generalized linear model0.87640.80000.87860.8830.85
Decision tree0.79780.74470.79700.8010.78
Random forest0.87640.82610.87510.8800.86
Support-vector machine0.86520.79590.86620.8710.84
Neural network
k-Nearest neighbor 0.84270.78720.84190.8470.83

Boldface values indicate highest performance group.

As seen in Table 2 , the performance of the neural network model is outstanding with an accuracy of 0.9303 and a recall of 0.9380. In addition, running multiple experiments with differently proportioned training and test sets (changing from 70:30 to either 60:40 or 80:20) with neural network, the performance metrics were unchanged.

A high recall value indicates a lower propensity for false negatives. In disease prediction, a low recall (high frequency of false negatives), would misdiagnose patients with CAD as healthy, which may have devastating consequences. For example, a recall rate of 0.9380 (achieved with neural network) implies that this algorithm will correctly identify the presence of CAD in approximately 94 patients out of 100 patients with CAD.

Figure 2 compares the performance of the six ML algorithms using AUC-ROC. AUC-ROC curves allow to visualize the tradeoff between the true positive rate and false positive rate, whereas AUC (see above) is useful to compare multiple algorithms or hyperparameters combinations (these two are obtained by different methods). As seen in the figure, all models except for decision tree perform well, there is very little difference in the AUC-ROC cures. Nevertheless, the ROC curve corresponding to the neural network model has a relatively higher positive slope, indicating that this model gives higher ‘true positive percentage.' Combining with the results obtained for the performance metrics, it may be concluded that the neural network model is able to predict CAD more effectively than the other models.

An external file that holds a picture, illustration, etc.
Object name is fsoa-07-698-g2.jpg

(A) Area under the ROC curve (AUC) for all the six models with estimated 95% confidence intervals. (B) Full ROC curves for the six models. Note the matching color code.

The computer code generated in this study, which was developed in R, has been made available on the public repository GitHub and supplementally with this manuscript. As R is a free software tool for statistical modeling with an extensive set of code libraries, collaboration and improvement on the code generated in this study is welcomed and encouraged. The code libraries in R include advanced ML methodologies (voting, bagging, optimization, etc.) with more to be added as R evolves further as an open source tool [ 19 ].

We next extracted at the ‘variable of importance’ in the neural network model. Understanding the relative importance of variables may dictate which variables are necessary to be included in the risk prediction of CAD. As shown in Figure 3 , ‘restcg’ (resting electrocardiogram [EKG]) and sex (gender) have a relatively high importance, while the three attributes ‘age', ‘thalch’ (maximum heart rate achieved during the thallium test) and ‘exang’ (exercise induced angina) have the least importance. The top ten variables of importance as assigned by the neural network model, in descending order, are: resting EKG, patient gender, chest pain, slope of the peak exercise ST segment, ST depression induced by exercise relative to rest, fasting blood sugar (higher or lower than 120 mg/dl), thallium heart scan, number of major vessels, resting blood pressure and serum cholesterol.

An external file that holds a picture, illustration, etc.
Object name is fsoa-07-698-g3.jpg

(A) shows the importance as obtained with a built-in code library function and (B) is the normalized graph.

restecg: Resting EKG; cp: Chest pain; slope: Slope of the peak exercise ST segment; oldpeak: ST depression induced by exercise relative to rest; fbs: Fasting blood sugar; thal: Thallium heart scan; ca: Number of major vessels; testbps: Resting blood pressure; chol: Serum cholesterol; thalch: Maximum heart rate achieved; exang: Exercise induced angina.

Heinze et al. [ 20 ] proposed certain recommendations on the application of variable selection methods to help modeling in life sciences and its worth following these recommendations in future efforts of identifying which of the variables of importance are worth studying in the risk modeling.

Conclusion & future perspective

In this study, we demonstrated that ML algorithms can be applied with high accuracy and recall to detect the presence of CAD using a publicly available dataset. We also demonstrated that the neural network model outperforms other ML models to detect CAD. We deposited the associated computer code in the public domain (see [ 23 ]) in the hope that we are contributing to an open source community.

Although CAD is both widely prevalent and may lead to fatal consequences, timely detection of CAD would empower clinicians to treat modifiable risk factors associated with the progression of CAD. Using an ML approach provides the ability to predict the presence of CAD with high accuracy and recall, and thus allows practitioners to practice preventative medicine in patients with CAD in a timelier manner. However, at such initial stages, it should be noted that ML serves solely as a predictor of CAD rather than a diagnostic tool. We hope that as more datasets are available for training the algorithm, we may be able to label ML algorithms as diagnostic steps in CAD management. Because machine learning utilizes datasets of patients who have already been diagnosed, the predictive ability of an ML algorithm for CAD would improve as more data are supplied to the algorithm. We visualize that a viable ML solution for predicting coronary artery disease (CAD) evolves in three steps: (1) model exploration; (2) model refinement; and (3) monitoring and maintenance. A proposed framework for arriving at a practical ML solution for the detection of CAD is depicted in Figure 4 . We believe that with our current effort, we have achieved the first step (marked 1 in Figure 4 ). We are hopeful that this study can help form the basis for further testing/validating of our algorithms with multiple and larger datasets. We look forward to gain access to larger datasets for further validation and refinement (depicted as step 2 in Figure 4 ), with the eventual goal of providing an open source solution (step 3 in Figure 4 ) to aid healthcare practitioners in the detection and treatment of CAD.

An external file that holds a picture, illustration, etc.
Object name is fsoa-07-698-g4.jpg

The dashed arrow indicates the iteration involving the machine learning algorithms presented here (also marked 1). Steps 2 and 3 (solid arrows) indicate future explorations. Modified with permission from [ 26 ].

Summary points

  • Coronary artery disease (CAD) is the most common type of heart disease, with many modifiable risk factors.
  • Early detection of CAD provides for management that would help prevent undesirable clinical outcomes. A comparative analysis of six machine learning algorithms was conducted, utilizing the UC Irvine Cleveland dataset to predict disease outcomes.
  • The dataset was preprocessed, and six ML algorithms were trained and applied to the data.
  • ML algorithms were evaluated based on accuracy, recall, F1 score and area under the curve. The six ML models performed well, with accuracies found to be greater than 0.79.
  • The computer code generated in this study, which was developed in R, has been made available on the public repository GitHub and supplementally with this manuscript. ML algorithms can be applied with high accuracy and recall to detect the presence of CAD using a publicly available dataset.
  • We envision that moving forward, a viable ML solution for predicting CAD evolves in three steps: (1) model exploration; (2) model refinement; and (3) monitoring and maintenance.

Supplementary Material

Acknowledgments.

The authors thank V Kaushik for help with the R programming and M Cassar for comments on the manuscript.

Supplementary data

To view the supplementary data that accompany this paper please visit the journal website at: www.future-science.com/doi/suppl/10.2144/fsoa-2020-0206

Author contributions

A Akella conceived the study; A Akella and S Akella planned the coding and analysis; A Akella wrote the script; A Akella and S Akella analyzed the results and prepared the manuscript.

Financial & competing interests disclosure

The authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.

No writing assistance was utilized in the production of this manuscript

Ethical conduct of research

The authors state that they have obtained appropriate institutional review board approval or have followed the principles outlined in the Declaration of Helsinki for all human or animal experimental investigations.

Open access

This work is licensed under the Creative Commons Attribution 4.0 License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

  • DOI: 10.1109/ICAC3N60023.2023.10541803
  • Corpus ID: 270281331

Heart Disease Prediction using Machine Learning

  • Gambhir Singh , Naveen Kumar , +1 author Prerna Kumari
  • Published in 5th International Conference… 15 December 2023
  • Medicine, Computer Science
  • 2023 5th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N)

27 References

Heart disease prediction using effective machine learning techniques, heart disease prediction using machine learning techniques, implementation of machine learning model to predict heart failure disease, prediction of heart disease using machine learning, predictive analytics to prevent and control chronic diseases, prediction of heart disease using multiple linear regression model, a reliable classifier model using data mining approach for heart disease prediction, prediction of heart disease using machine learning algorithms, heart disease prediction system using data mining techniques, related papers.

Showing 1 through 3 of 0 Related Papers

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

A systematic review of clinical health conditions predicted by machine learning diagnostic and prognostic models trained or validated using real-world primary health care data

Affiliations.

  • 1 Chair of Epidemiology, Department of Sport and Health Sciences, Technical University of Munich (TUM), Munich, Germany.
  • 2 Faculty of Medicine and Health Sciences, McGill University, Montreal, Quebec, Canada.
  • PMID: 37682909
  • PMCID: PMC10491005
  • DOI: 10.1371/journal.pone.0274276

With the advances in technology and data science, machine learning (ML) is being rapidly adopted by the health care sector. However, there is a lack of literature addressing the health conditions targeted by the ML prediction models within primary health care (PHC) to date. To fill this gap in knowledge, we conducted a systematic review following the PRISMA guidelines to identify health conditions targeted by ML in PHC. We searched the Cochrane Library, Web of Science, PubMed, Elsevier, BioRxiv, Association of Computing Machinery (ACM), and IEEE Xplore databases for studies published from January 1990 to January 2022. We included primary studies addressing ML diagnostic or prognostic predictive models that were supplied completely or partially by real-world PHC data. Studies selection, data extraction, and risk of bias assessment using the prediction model study risk of bias assessment tool were performed by two investigators. Health conditions were categorized according to international classification of diseases (ICD-10). Extracted data were analyzed quantitatively. We identified 106 studies investigating 42 health conditions. These studies included 207 ML prediction models supplied by the PHC data of 24.2 million participants from 19 countries. We found that 92.4% of the studies were retrospective and 77.3% of the studies reported diagnostic predictive ML models. A majority (76.4%) of all the studies were for models' development without conducting external validation. Risk of bias assessment revealed that 90.8% of the studies were of high or unclear risk of bias. The most frequently reported health conditions were diabetes mellitus (19.8%) and Alzheimer's disease (11.3%). Our study provides a summary on the presently available ML prediction models within PHC. We draw the attention of digital health policy makers, ML models developer, and health care professionals for more future interdisciplinary research collaboration in this regard.

Copyright: © 2023 Abdulazeem et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Fig 1. Prisma flow diagram.

Fig 2. Number of studies for publication…

Fig 2. Number of studies for publication years.

Fig 3. Percentage presentation of the results…

Fig 3. Percentage presentation of the results of (PROBAST) tool.

The tool has two components.…

Similar articles

  • Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas. Crider K, Williams J, Qi YP, Gutman J, Yeung L, Mai C, Finkelstain J, Mehta S, Pons-Duran C, Menéndez C, Moraleda C, Rogers L, Daniels K, Green P. Crider K, et al. Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217. Cochrane Database Syst Rev. 2022. PMID: 36321557 Free PMC article.
  • Application of Artificial Intelligence in Community-Based Primary Health Care: Systematic Scoping Review and Critical Appraisal. Abbasgholizadeh Rahimi S, Légaré F, Sharma G, Archambault P, Zomahoun HTV, Chandavong S, Rheault N, T Wong S, Langlois L, Couturier Y, Salmeron JL, Gagnon MP, Légaré J. Abbasgholizadeh Rahimi S, et al. J Med Internet Res. 2021 Sep 3;23(9):e29839. doi: 10.2196/29839. J Med Internet Res. 2021. PMID: 34477556 Free PMC article. Review.
  • Machine learning in oral squamous cell carcinoma: Current status, clinical concerns and prospects for future-A systematic review. Alabi RO, Youssef O, Pirinen M, Elmusrati M, Mäkitie AA, Leivo I, Almangush A. Alabi RO, et al. Artif Intell Med. 2021 May;115:102060. doi: 10.1016/j.artmed.2021.102060. Epub 2021 Mar 26. Artif Intell Med. 2021. PMID: 34001326 Review.
  • The future of Cochrane Neonatal. Soll RF, Ovelman C, McGuire W. Soll RF, et al. Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12. Early Hum Dev. 2020. PMID: 33036834
  • Prognostic models for newly-diagnosed chronic lymphocytic leukaemia in adults: a systematic review and meta-analysis. Kreuzberger N, Damen JA, Trivella M, Estcourt LJ, Aldin A, Umlauff L, Vazquez-Montes MD, Wolff R, Moons KG, Monsef I, Foroutan F, Kreuzer KA, Skoetz N. Kreuzberger N, et al. Cochrane Database Syst Rev. 2020 Jul 31;7(7):CD012022. doi: 10.1002/14651858.CD012022.pub2. Cochrane Database Syst Rev. 2020. PMID: 32735048 Free PMC article.
  • Integrated Machine Learning Approach for the Early Prediction of Pressure Ulcers in Spinal Cord Injury Patients. Kim Y, Lim M, Kim SY, Kim TU, Lee SJ, Bok SK, Park S, Han Y, Jung HY, Hyun JK. Kim Y, et al. J Clin Med. 2024 Feb 8;13(4):990. doi: 10.3390/jcm13040990. J Clin Med. 2024. PMID: 38398304 Free PMC article.
  • Can adverse childhood experiences predict chronic health conditions? Development of trauma-informed, explainable machine learning models. Afzal HB, Jahangir T, Mei Y, Madden A, Sarker A, Kim S. Afzal HB, et al. Front Public Health. 2024 Jan 15;11:1309490. doi: 10.3389/fpubh.2023.1309490. eCollection 2023. Front Public Health. 2024. PMID: 38332940 Free PMC article.
  • Aoki M. Editorial: Science and roles of general medicine. Japanese J Natl Med Serv. 2001;55: 111–114. doi: 10.11261/iryo1946.55.111 - DOI
  • Troncoso EL. The Greatest Challenge to Using AI/ML for Primary Health Care: Mindset or Datasets? Front Artif Intell. 2020;3: 53. doi: 10.3389/frai.2020.00053 - DOI - PMC - PubMed
  • Hashim MJ. A definition of family medicine and general practice. J Coll Physicians Surg Pakistan. 2018;28: 76–77. doi: 10.29271/jcpsp.2018.01.76 - DOI - PubMed
  • Cao L. Data science: A comprehensive overview. ACM Comput Surv. 2018;50: 1–42. doi: 10.1145/3076253 - DOI
  • Liyanage H, Liaw ST, Jonnagaddala J, Schreiber R, Kuziemsky C, Terry AL, et al.. Artificial Intelligence in Primary Health Care: Perceptions, Issues, and Challenges. Yearb Med Inform. 2019;28: 41–46. doi: 10.1055/s-0039-1677901 - DOI - PMC - PubMed

Publication types

  • Search in MeSH

Grants and funding

Linkout - more resources, full text sources.

  • Europe PubMed Central
  • PubMed Central
  • Public Library of Science

Miscellaneous

  • NCI CPTAC Assay Portal

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

Heart Problem Prediction Using Machine Learning Technique

  • Conference paper
  • First Online: 15 June 2024
  • Cite this conference paper

literature review for heart disease prediction using machine learning

  • Rishu Jeet 40 ,
  • Shashank Kumar Dubey 41 &
  • Aminul Islam 40  

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 1148))

Included in the following conference series:

  • International Conference on Energy Systems, Drives and Automations

15 Accesses

In our daily life, we can see the cases of heart problems increasing day by day. It is very important to predict serious diseases in early stages, as early as possible, so that we can save the life of the person involved in that disease. It is also realized that it is a difficult task to catch this disease in the early stage or cure it in the early stage because more accuracy and precision is required for that, so that we can catch the exact problem. Heart disease cases are increasing day by day. Once in a month or sometime twice or thrice in a month, we hear that one person dies because of a heart attack. As per WHO, about 18 million people die every year because of heart disease, which is a very large number, and a serious problem. With such diseases, our population is also increasing day by day and it also becomes a problem to check every person and catch disease easily. So, we have to improve technology, which diagnoses the disease in the initial stage. These days, many people are consuming fast foods and street foods. These things also come under reason to increase disease or also heart disease because many factors are responsible for heart disease like blood pressure problem, diabetes, cholesterol etc. If these diseases are controlled, then heart disease can also be controlled. So, we should focus on each factor—from medical treatment to eating hygienic food. This chapter is based on heart disease prediction, so that we can detect it in the early stage by using machine learning and its algorithms. We have taken online data, which has 14 features and 304 instances. Firstly, we analyze and then try to predict heart disease. We have used 4 algorithms to predict it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

M. Sanz et al., Periodontitis and cardiovascular diseases: consensus report. J. Clin. Periodontol. 47 (3), 268–288 (2020). https://doi.org/10.1111/jcpe.13189

Article   Google Scholar  

Machine Learning. https://en.wikipedia.org/wiki/Machine_learning . Last Accessed 2023/03/09

HEARTS technical package for cardiovascular disease management in primary health care: risk based CVD management. World Health Organization, Geneva (2020). Licence: CC BY-NC-SA 3.0 IGO. https://www.who.int/publications/i/item/WHO-NMH-NVI-19-8 . Last accessed 2023/03/09

Y. Wang, S. Chen, C.J. Lavie, J. Zhang, X. Sui, An overview of non-exercise estimated cardiorespiratory fitness: estimation equations, cross-validation and application. J. Sci. Sport Exerc. 1 (1), 38–53 (2019). https://doi.org/10.1007/s42978-019-0003-x

K. Vanisree, J. Singaraju, Decision support system for congenital heart disease diagnosis based on signs and symptoms using neural networks. Int. J. Comput. Appl. 19 (6), 6–12 (2011). https://doi.org/10.5120/2368-3115

A.-H. Abdel-Aty et al., A quantum classification algorithm for classification incomplete patterns based on entanglement measure. J. Intell. Fuzzy Syst. 38 (9), 1–8 (2020)

Google Scholar  

A. Sagheer, M. Zidan, M.M. Abdelsamea, A novel autonomous perceptron model for pattern classification applications. Entropy 21 (8), 763 (2019). https://doi.org/10.3390/e21080763

Article   MathSciNet   Google Scholar  

B.A. Tama, A. Firdaus, F.S. Rodiyatul, Detection of type 2 diabetes mellitus with data mining approach using support vector machine, in 2010 International Conference on Informatics Cybernetics, and Computer Applications (ICICCA2010) , 19–21 July 2010, Bangalore, India (2008), pp. 121–123. http://repository.unsri.ac.id/id/eprint/9455

K. Raza, Improving the prediction accuracy of heart disease with ensemble learning and majority voting rule, in Advances in Ubiquitous Sensing Applications for Healthcare, U-Healthcare Monitoring Systems , ed. by N. Dey, A.S. Ashour, S.J. Fong, S. Borra (Academic Press, 2019), pp. 179–196. ISSN 25891014, ISBN 9780128153703, https://doi.org/10.1016/B978-0-12-815370-3.00008-6

A. Haq, J. Li, M.H. Memon, S. Nazir, R. Sun, A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms. Mob. Inf. Syst., Article ID 3860146, 1–21 (2018). https://doi.org/10.1155/2018/3860146

Support Vector Machine. https://en.wikipedia.org/wiki/Support_vector_machine. Last accessed 2023/03/09

Heart Disease Prediction using KNN—The K-Nearest Neighbours Algorithm. https://www.analyticsvidhya.com/blog/2021/07/heart-disease-prediction-using-knn-the-k-nearest-neighbours-algorithm/ . Last accessed 2023/03/09

C. Ricciardi et al., Linear discriminant analysis and principal component analysis to predict coronary artery disease. Health Inform. J. 26 (3), 2181–2192 (2020). https://doi.org/10.1177/1460458219899210

A.A. Ali, H.S. Hassan, E.M. Anwar, Improve the accuracy of heart disease predictions using machine learning and feature selection techniques, in Machine Learning, Image Processing, Network Security and Data Sciences, MIND 2020 , ed. by A. Bhattacharjee, S. Borgohain, B. Soni, G. Verma, X.Z. Gao. Communications in Computer and Information Science, vol. 1241 (2020), pp. 214–228. https://doi.org/10.1007/978-981-15-6318-8_19

A. Karen Garate-Escamilla, A.H.E. Hassani, E. Andres, Classification models for heart disease prediction using feature selection and PCA. Inf. Med. Unlocked 19 , 100330 (2020). https://doi.org/10.1016/j.imu.2020.100330

M. Yaqoob, F. Iqbal, S. Zahir, Comparing predictive performance of k-nearest neighbors and support vector machine for predicting ischemic heart disease. Res. J. Adv. Sci. 1 (2) (2020). https://royalliteglobal.com/rjas/article/view/391

E.Z. Ferdousy, M.M. Islam, M.A. Matin, Combination of Naïve Bayes classifier and K-nearest neighbor (cNK) in the classification based predictive models. Comput. Inf. Sci. 6 (3), 48–56 (2013). https://doi.org/10.5539/cis.v6n3p48

U. Nagavelli, D. Samanta, P. Chakraborty, Machine learning technology-based heart disease detection models. J. Healthc. Eng. 27 (2022), 7351061 (2022). https://doi.org/10.1155/2022/7351061

S.M.M. Hasan, M.A. Mamun, M.P. Uddin, M.A. Hossain, Comparative analysis of classification approaches for heart disease prediction, in International Conference on Computer, Communication, Chemical, Material and Electronic Engineering (IC4ME2) (2018). https://doi.org/10.1109/ic4me2.2018.8465594

Download references

Author information

Authors and affiliations.

Department of ECE, BIT Mesra, Ranchi, 835215, India

Rishu Jeet & Aminul Islam

Department of ECE, G. Pullaiah College of Engineering and Technology, Kurnool, 518002, Andhra Pradesh, India

Shashank Kumar Dubey

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Aminul Islam .

Editor information

Editors and affiliations.

Department of Instrumentation and Control Engineering, Dr. B. R. Ambedkar National Institute of Technology Jalandhar, Jalandhar, Punjab, India

Afzal Sikander

Department of Control Systems, Lukasiewicz Research Network - Institute for Sustainable Technologies, Radom, Poland

Marta Zurek-Mortka

Department of Electrical Engineering, Indian Institute of Engineering Science, Howrah, West Bengal, India

Chandan Kumar Chanda

Department of Mechanical Engineering, Indian Institute of Technology Guwahati, Guwahati, Assam, India

Pranab Kumar Mondal

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper.

Jeet, R., Dubey, S.K., Islam, A. (2024). Heart Problem Prediction Using Machine Learning Technique. In: Sikander, A., Zurek-Mortka, M., Chanda, C.K., Mondal, P.K. (eds) Advances in Energy and Control Systems. ESDA 2022. Lecture Notes in Electrical Engineering, vol 1148. Springer, Singapore. https://doi.org/10.1007/978-981-97-0154-4_22

Download citation

DOI : https://doi.org/10.1007/978-981-97-0154-4_22

Published : 15 June 2024

Publisher Name : Springer, Singapore

Print ISBN : 978-981-97-0153-7

Online ISBN : 978-981-97-0154-4

eBook Packages : Intelligent Technologies and Robotics Intelligent Technologies and Robotics (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

COMMENTS

  1. (PDF) A Review on Heart Disease Prediction Using Machine Learning

    mining and machine learning techniques used in heart disease prediction and compare them to find the best method for prediction. Naive Bayes and improved K-me ans algorithms are

  2. (PDF) A Comprehensive Review on Heart Disease Prediction Using Data

    A Comprehensive Review on Heart Disease Prediction Using Data Mining and Machine Learning Techniques October 2020 American Journal of Artificial Intelligence 4(1):20-29

  3. Machine learning prediction in cardiovascular diseases: a meta ...

    Most importantly, pooled analyses indicate that, in general, ML algorithms are accurate (AUC 0.8-0.9 s) in overall cardiovascular disease prediction. In subgroup analyses of each ML algorithms ...

  4. Machine learning-based heart disease diagnosis: A systematic literature

    Table 1 presents an overview of some of the previously published literature reviews on heart disease diagnosis using ML approaches. From Table 1, it can be observed that most of the referenced literature emphasizes machine learning approaches while the systematic literature review (SLR) is mostly ignored.For instance, Benhar et al. published an SLR whose primary concern was different ML ...

  5. Comprehensive evaluation and performance analysis of machine learning

    The literature review elucidates many methodologies for diagnosing and predicting heart illness, including advanced machine learning, deep learning, and sensor-based data-collecting strategies.

  6. A systematic review on machine learning approaches for cardiovascular

    A forecast based on machine learning techniques can be beneficial in detecting cardiovascular disease (CVD) with maximum precision and accuracy. The disease's effective prediction helps in early diagnosis, which cuts down the mortality rate. A health history and the causes of heart disease require the efficient detection and prediction of CVD.

  7. Effective Heart Disease Prediction Using Machine Learning Techniques

    The diagnosis and prognosis of cardiovascular disease are crucial medical tasks to ensure correct classification, which helps cardiologists provide proper treatment to the patient. Machine learning applications in the medical niche have increased as they can recognize patterns from data. Using machine learning to classify cardiovascular disease occurrence can help diagnosticians reduce ...

  8. Heart Disease Prediction Using Machine Learning: A Systematic

    This can be done by using a prediction model. To offer a comprehensive examination of machine learning research about the prediction of cardiac disease, a systematic literature review (SLR) was undertaken. Based on these papers, researchers can focus on four main research questions. The UCI dataset was commonly used in 25 out of 32 papers.

  9. Heart Disease Prediction Using Machine Learning: A Systematic

    A comprehensive examination of machine learning research about the prediction of cardiac disease, a systematic literature review (SLR) was undertaken and Random forest was the most popular machine learning algorithm, used in six papers. Heart disease, categorized as a cardiovascular condition, stands as a prominent factor in worldwide mortality, accounting for approximately 32% of all deaths ...

  10. Heart Disease Prediction Using Machine Learning: A Systematic

    Heart Disease Prediction Using Machine Learning: A Systematic Literature Review. August 2023. DOI: 10.1109/ICITACEE58587.2023.10277209. Conference: 2023 10th International Conference on ...

  11. Machine Learning Technology-Based Heart Disease Detection Models

    Coronary artery disease (CAD), also known as ischemia heart disease (IHD), is the leading cause of death in adults over the age of 35 in different countries. During the same time span, it became China's biggest cause of death. When blood flow to the heart is reduced due to coronary artery stenosis, IHD occurs.

  12. Heart Disease Prediction Using Machine Learning Techniques: A

    Section 3 presents the literature review carried out to efficiently predict and diagnose heart disease using machine learning algorithms and techniques. Section 4 is the discussion section; it shows the comparative representation of various machine learning methodologies based on their accuracy in a tabular form.

  13. PDF Prediction of Heart Disease Using Machine Learning: A Systematic

    Prediction of Heart Disease Using Machine Learning: A Systematic Literature Review . Alfredo Daza 1Vergaray , Juan Carlos Herrera Miranda2, Juana Bobadilla Cornelio. 3, Atilio Rubén López Carranza. 4 . and Carlos Fidel Ponce Sanchez5 . 1. Faculty of Engineering and Architecture, School of Systems Engineering, Universidad César Vallejo, Lima ...

  14. Early and accurate detection and diagnosis of heart disease using

    In literature, the Cleveland heart disease dataset is ... the prediction of heart disease using machine learning algorithms. ... G. Effective heart disease prediction using hybrid machine learning ...

  15. Risk prediction of cardiovascular disease using machine learning

    Cardiovascular disease (CVD) makes our heart and blood vessels dysfunctional and often leads to death or physical paralysis. Therefore, early and automatic detection of CVD can save many human lives. Multiple investigations have been carried out to achieve this objective, but there is still room for improvement in performance and reliability.

  16. Review On Cardiovascular Disease Prediction Using Machine Learning

    cardiovascular diseases (CVD) are the one amongst foremost reasons for death around the globe over the past decades. Early detection of heart diseases and continuous monitoring reduces the death rate. The heart is responsible for blood circulation in the entire human body. Heart attack is happening because of coronary artery disease (CAD) and chronic heart failure (CHF). Angiography is one of ...

  17. Machine learning algorithms for predicting coronary artery disease

    Coronary artery disease (CAD) is the most common type of heart disease, affecting millions worldwide. According to recent statistics from the American Heart Association, coronary heart disease accounted for 13% of deaths in the USA in 2018 [].Worldwide in 2015, CAD was found to be one of the most common causes of death, with 15.6% of all deaths resulting from the disease [].

  18. Heart Disease Prediction Using Machine Learning

    Cardiovascular disease refers to any critical condition that impacts the heart. Because heart diseases can be life-threatening, researchers are focusing on designing smart systems to accurately diagnose them based on electronic health data, with the aid of machine learning algorithms. This work presents several machine learning approaches for predicting heart diseases, using data of major ...

  19. An artificial intelligence model for heart disease detection using

    We show how machine learning can help predict whether a person will develop heart disease. In this paper, a python-based application is developed for healthcare research as it is more reliable and helps track and establish different types of health monitoring applications.

  20. Heart Disease Prediction using Machine Learning

    This study judge the efficacy of various dummy based on machine learning techniques and methods used in medical databases to enable the analysis of large and complex data. Cardiovascular disease, commonly known as cardiovascular disease (CVD), is currently the most difficult disease to treat in India and elsewhere in the globe. It has long been a primary reason for death worldwide. The bottom ...

  21. (PDF) A Review on Heart Disease Prediction using Machine Learning and

    Heart is the next major organ comparing to brain which has more priority in Human body. It pumps the blood and supplies to all organs of the whole body. Prediction of occurrences of heart diseases ...

  22. PDF A Review on Heart Disease Prediction using Machine Learning and Data

    A Review on Heart Disease Prediction using Machine Learning and Data Analytics Approach. Figure 1 depicts the parts of human heart such as Left atrium, Right atrium, Right ventricle, Left ventricle, Aorta, pulmonary vein, Pulmonary valve, Pulmonary artery, Tricuspid valve, Aortic valve, Mitral valve, Superior vena cava and Interior vena cava.

  23. A systematic review of clinical health conditions predicted by machine

    With the advances in technology and data science, machine learning (ML) is being rapidly adopted by the health care sector. However, there is a lack of literature addressing the health conditions targeted by the ML prediction models within primary health care (PHC) to date. To fill this gap in knowl …

  24. PDF A Literature Review on Heart Disease Prediction Based on Data Mining

    Heart disease is the world's largest cause of death. The tremendous amount of data generated for the prediction of heart disease is too difficult and wasteful to process and analyze in the conventional way. Data mining provides methodologies and techniques to transform these mounds into useful information for decision-making.

  25. A Review: Heart Disease Prediction in Machine Learning & Deep Learning

    The whole world has come to know the fact that heart disease is not a trivial issue. Although the years have changed, throngs of patients are diagnosed with this lethal disease and not only is it not decreasing, but increasing and this is evident from the analysis of death rates across the country. Heart disease is caused by several main factors such as negligence in taking care of diet and ...

  26. Heart Problem Prediction Using Machine Learning Technique

    So, we should focus on each factor—from medical treatment to eating hygienic food. This chapter is based on heart disease prediction, so that we can detect it in the early stage by using machine learning and its algorithms. We have taken online data, which has 14 features and 304 instances. Firstly, we analyze and then try to predict heart ...

  27. Risk Prediction of Heart Diseases in Breast Cancer Patients: A Deep

    Accurately predicting heart disease risks in breast cancer patients is crucial for clinical decision support and patient safety. This study developed and evaluated predictive models for six heart diseases using real-world electronic health records (EHRs) data. We incorporated a trainable decay mechanism to handle missing values in the long short-term memory (LSTM) model, creating LSTM-D models ...

  28. Development and validation of machine learning models to predict

    Using data from a single center to train a machine learning model increases the risk of model overfitting and blood transfusions are significantly heterogeneous across hospitals and physician practices, which can lead to potentially limited the generalizability of the model [Citation 47]. Therefore, model applications to other regions and ...

  29. App Based Cardiovascular Disease Prediction using Machine Learning

    Study investigates machine learning (ML) algorithms for early detection of cardiovascular diseases (CVDs), given their significant global impact. Utilizing demographic data, medical history, and lab results, algorithms like Logistic Regression, Support Vector Machine, Decision Tree, Random Forest, and K-Nearest Neighbors are examined. An ensemble model, incorporating Random Forest, is ...

  30. A comprehensive review of predictive analytics models for mental

    We review existing research on the use of machine learning in the detection and treatment of mental illness and discuss the implications for future research. The major contribution of our work is as follows: • Review the machine learning models, algorithms, and applications for the early detection of mental disease. •