2016
DOI: 10.1371/journal.pone.0163942
|View full text |Cite
|
Sign up to set email alerts
|

Prediction of Incident Diabetes in the Jackson Heart Study Using High-Dimensional Machine Learning

Abstract: Statistical models to predict incident diabetes are often based on limited variables. Here we pursued two main goals: 1) investigate the relative performance of a machine learning method such as Random Forests (RF) for detecting incident diabetes in a high-dimensional setting defined by a large set of observational data, and 2) uncover potential predictors of diabetes. The Jackson Heart Study collected data at baseline and in two follow-up visits from 5,301 African Americans. We excluded those with baseline di… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
41
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 43 publications
(44 citation statements)
references
References 32 publications
(23 reference statements)
3
41
0
Order By: Relevance
“…This study demonstrates the benefit of using a common generalized linear model technique combined with sampling to access data available in the EHR to create a clinically relevant prediction model. When compared to other prediction models developed using large datasets and machine learning techniques to predict the development of diabetes, 43 pancreatitis severity, 44 heart failure readmissions, 44,45 and sepsis 18 (sensitivities 0.74, 0.87, and 0.92, respectively, and AUCs 0.78, 0.82, and 0.86, respectively), results of our model are encouraging and could benefit clinical practice. While no prediction model has been published to identify hospitalized patients at high-risk of future COT, prediction tools to assess the patient's risk of opioid misuse have been developed and validated.…”
Section: Discussionmentioning
confidence: 83%
“…This study demonstrates the benefit of using a common generalized linear model technique combined with sampling to access data available in the EHR to create a clinically relevant prediction model. When compared to other prediction models developed using large datasets and machine learning techniques to predict the development of diabetes, 43 pancreatitis severity, 44 heart failure readmissions, 44,45 and sepsis 18 (sensitivities 0.74, 0.87, and 0.92, respectively, and AUCs 0.78, 0.82, and 0.86, respectively), results of our model are encouraging and could benefit clinical practice. While no prediction model has been published to identify hospitalized patients at high-risk of future COT, prediction tools to assess the patient's risk of opioid misuse have been developed and validated.…”
Section: Discussionmentioning
confidence: 83%
“…The use of machine learning models, as opposed to linear models, carries the capability of capturing subtle multivariate relationships which may be otherwise difficult to detect. Additionally, machine learning methods have the capability of dealing with large numbers of variables whiles producing powerful predictive models . Employment of these methods in the predictions of diabetes has been described in several studies, yet these were based on small and select populations (less than 10 000 individuals) limiting the generalizability of the results …”
Section: Discussionmentioning
confidence: 99%
“…RF is a tree‐based, nonparametric data mining/machine learning method requiring no assumption about data distribution . In recent years, various research groups have applied RF to investigate various health care–related questions including prediction of hospital readmissions, adverse drug reactions, incident diabetes, and congenital heart defects . The RF algorithm turned out to be a better choice in many scenarios compared with other machine learning algorithms, since it combines the six properties that are all important in health care scenarios: 1) it does not overfit, 2) it is robust to noise, 3) it has an internal mechanism to estimate error rate, 4) it provides indices of variable importance, 5) it naturally works with mixes of continuous and categorical variables, and 6) it can be used for data imputation and cluster analysis.…”
Section: Discussionmentioning
confidence: 99%