Coronary heart disease has been the number one illness to cause death in the world for decades. The healthcare indus-tries generates vast amount of clinical data, driven by medical records of patients, regulatory requirements, and results of medicalexaminations. In order to obtain the most relevant features for coronary heart disease, this study has conducted an experimental evaluation on data-driven diagnosis of coronary heart disease using classification algorithms. A statistical test (Chi-square) is usedto find the most valuable features and risk factors associated with coronary heart disease. The purposed of this univariate feature extraction algorithm is to determine the difference between the observed resuslts with expected results. Furthermore, CHD is predicted using several classification machine learning algorithms including Logistic Regression, Complement Naïve Bayes. andSupport Vector Machine (SVM). This study also evaluates ensemble machine learning algorithms, such as Random Forest and Extreme Gradient Boosting (XGBoost), Gradient Boost, to find the best performance of the classifications algorithms and select essential features from the dataset. Holdout and cross-validations methods are used to separated the dataset into two sets, called thetraining set and the testing set. The performance of proposed algorithm are assessed in terms of certain factors such as specificityand sensitivity. From this study, it is shown that Gradient boost model exhibits the best performance with 0.839 sensitivity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.