2011
DOI: 10.1186/1472-6947-11-51
|View full text |Cite
|
Sign up to set email alerts
|

Predicting disease risks from highly imbalanced data using random forest

Abstract: BackgroundWe present a method utilizing Healthcare Cost and Utilization Project (HCUP) dataset for predicting disease risk of individuals based on their medical diagnosis history. The presented methodology may be incorporated in a variety of applications such as risk management, tailored health communication and decision support systems in healthcare.MethodsWe employed the National Inpatient Sample (NIS) data, which is publicly available through Healthcare Cost and Utilization Project (HCUP), to train random f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

9
311
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 558 publications
(346 citation statements)
references
References 13 publications
9
311
0
Order By: Relevance
“…The main strength of this study is the use of models based on machine learning validated by leave-one-out cross-validation using prospective data, which demonstrated the clinical utility of the SVM/Ensemble/RANSAC/Random Forest methods in predicting survival [36][37][38][39]. We believe that our results may have positive clinical implications, because the information provided is objective and can be used in family meetings to help set expectations for reasonable clinical care.…”
Section: Principal Resultsmentioning
confidence: 91%
“…The main strength of this study is the use of models based on machine learning validated by leave-one-out cross-validation using prospective data, which demonstrated the clinical utility of the SVM/Ensemble/RANSAC/Random Forest methods in predicting survival [36][37][38][39]. We believe that our results may have positive clinical implications, because the information provided is objective and can be used in family meetings to help set expectations for reasonable clinical care.…”
Section: Principal Resultsmentioning
confidence: 91%
“…54 Actually, RF is proven highly versatile and has demonstrated high classification accuracy in numerous cases. 56,57 Because of its popularity in many areas special and/or enhanced varieties of RF have appeared. [58][59][60] The second of the two families of predictive modelling, brought into play in the study outlined here, is known as Nearest Shrunken Centroid (NSC) classifier.…”
Section: Applying Multivariate Data Analysis To Unveil Dietary Patternsmentioning
confidence: 99%
“…Such knowledge would be beneficial to hospitals, and also to insurance compaines, which can make evidence based decisions, and can optimize, validate and refine the rules that govern their business [6]. This important hidden knowledge can be found with the help of data mining, with methods such as clustering, feature selection, association rule mining, and many more.…”
Section: Introductionmentioning
confidence: 99%