Introduction Administrative claims data provide an important source for real-world evidence (RWE) generation, but incomplete reporting, such as for body mass index (BMI), limits the sample sizes that can be analyzed to address certain research questions. The objective of this study was to construct models by implementing machine-learning (ML) algorithms to predict BMI classifications (≥ 30, ≥ 35, and ≥ 40 kg/m 2 ) in administrative healthcare claims databases, and then internally and externally validate them. Methods Five advanced ML algorithms were implemented for each BMI classification on a random sampling of BMI readings from the Optum PanTher Electronic Health Record database (2%) and the Optum Clinformatics Date of Death (20%) database, while incorporating baseline demographic and clinical characteristics. Sensitivity analyses with oversampling ratios were conducted. Model performance was validated internally and externally. Results Models trained on the Super Learner ML algorithm (SLA) yielded the best BMI classification predictive performance. SLA model 1 utilized sociodemographic and clinical characteristics, including baseline BMI values; the area under the receiver operating characteristic curve (ROC AUC) was approximately 88% for the prediction of BMI classifications of ≥ 30, ≥ 35, and ≥ 40 kg/m 2 (internal validation), while accuracy ranged from 87.9% to 92.8% and specificity ranged from 91.8% to 94.7%. SLA model 2 utilized sociodemographic information and clinical characteristics, excluding baseline BMI values; ROC AUC was approximately 73% for the prediction of BMI classifications of ≥ 30, ≥ 35, and ≥ 40 kg/m 2 (internal validation), while accuracy ranged from 73.6% to 80.0% and specificity ranged from 71.6% to 85.9%. The external validation on the MarketScan Commercial Claims and Encounters database yielded relatively consistent results with slightly diminished performance. Conclusion This study demonstrated the feasibility and validity of using ML algorithms to predict BMI classifications in administrative healthcare claims data to expand the utility for RWE generation. Supplementary Information The online version contains supplementary material available at 10.1007/s12325-020-01605-6.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.