2020
DOI: 10.1002/jcla.23421
|View full text |Cite
|
Sign up to set email alerts
|

Prediction for cardiovascular diseases based on laboratory data: An analysis of random forest model

Abstract: Background To establish a prediction model for cardiovascular diseases (CVD) in the general population based on random forests. Methods A retrospective study involving 498 subjects was conducted in Xi'an Medical University between 2011 and 2018. The random forest algorithm was used to screen out the variables that greatly affected the CVD prediction and to establish a prediction model. The important variables were included in the multifactorial logistic regression analysis. The area under the curve (AUC) was c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
21
0
1

Year Published

2021
2021
2022
2022

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 30 publications
(27 citation statements)
references
References 47 publications
1
21
0
1
Order By: Relevance
“…Unlike other established risk assessment methods, the random forest algorithm can achieve higher accuracy in the disease prediction by using bootstrap aggregation and randomization of predictors. 26 , 27 Interestingly, using machine learning algorithms, we found that the age, tumor size, histological grade, organ-site metastasis, pathologic types, and TNM stage were indicated to be the most contributive risk factors of PM, consistent with the results of linear regression analysis. However, in this study, the aforementioned factors were included, and correlations within those clinicopathological characteristics were considered, which accounted for a better result than multivariable logistic regression.…”
Section: Discussionsupporting
confidence: 80%
“…Unlike other established risk assessment methods, the random forest algorithm can achieve higher accuracy in the disease prediction by using bootstrap aggregation and randomization of predictors. 26 , 27 Interestingly, using machine learning algorithms, we found that the age, tumor size, histological grade, organ-site metastasis, pathologic types, and TNM stage were indicated to be the most contributive risk factors of PM, consistent with the results of linear regression analysis. However, in this study, the aforementioned factors were included, and correlations within those clinicopathological characteristics were considered, which accounted for a better result than multivariable logistic regression.…”
Section: Discussionsupporting
confidence: 80%
“…As shown in Figure 2 , the variables that achieved peak values were placed at the top node location, closest to the root of the tree[ 8 ]. The value of a variable was indicated by the mean decrease in the Gini (MDG) index[ 14 ]. It has been established that the higher the index value, the more important is the value given by the variable.…”
Section: Methodsmentioning
confidence: 99%
“…In the DT, the tree models are composed of nodes and directed edges, and optimal classifications are achieved through learning processes that involve recursive feature selection, DT generation, and pruning [28]. RF uses a large series of decision trees with low reciprocal correlation and randomly selected features using the method of bagging to obtain more precise and stable classifications and predictions [29].…”
Section: Discussionmentioning
confidence: 99%
“…Existing research [36] has demonstrated that machine learning algorithms can produce accurate results when sorting epidemiological data. The random forest, a key data mining method in machine learning field that depends on a computer to learn all the complicated and nonlinear interactions among variables through minimization of errors between observed and predicted outcomes, can achieve a higher accuracy in the disease prediction by using bootstrap aggregation and randomization of predictors [29]. Besides, RF models are less prone to overfitting.…”
Section: Discussionmentioning
confidence: 99%