Objective
A key aspect of the precision medicine effort is the development of informatics tools that can analyze and interpret ‘big data’ sets in an automated and adaptive fashion, while providing accurate and actionable clinical information. The aims of this study were to develop machine learning algorithms for the identification of disease and the prognostication of mortality risk, and to determine whether such models perform better than classical statistical analyses.
Methods
Focusing on peripheral artery disease (PAD), patient data were derived from a prospective, observational study of 1,755 patients who presented for elective coronary angiography. We employed multiple supervised machine learning algorithms and utilized diverse clinical, demographic, imaging and genomic information in a hypothesis-free manner to build models that could identify patients with PAD and predict future mortality. Comparison was made to standard stepwise linear regression models.
Results
Our machine-learned models outperformed stepwise logistic regression models both for the identification of patients with PAD (AUC 0.87 versus 0.76, respectively, P=0.03), and predicting future mortality (AUC 0.76 versus 0.65, respectively, P=0.10). Both machine-learned models were markedly better calibrated than the stepwise logistic regression models, thus providing more accurate disease and mortality risk estimates.
Conclusions
Machine learning approaches can produce more accurate disease classification and prediction models. These tools may prove clinically useful for the automated identification of patients with highly morbid diseases for which aggressive risk factor management can improve outcomes.