The PCORnet Antibiotics and Childhood Growth Study is a large national longitudinal observational study in a diverse population that will examine the relationship between early antibiotic use and subsequent growth patterns in children.
Key Points
Question
Can machine learning deployed in electronic health records be used to improve readmission risk estimation for patients following acute myocardial infarction?
Findings
In this cohort study examining externally validated machine learning risk models for 30-day readmission of 10 187 patients following hospitalization for acute myocardial infarction, good discrimination performance was noted at the development site, but the best discrimination did not result in the best calibration. External validation yielded significant declines in discrimination and calibration.
Meaning
The findings of this study highlight that robust calibration assessments are a necessary complement to discrimination when machine learning models are used to predict post–acute myocardial infarction readmission; challenges with data availability across sites, even in the presence of a common data model, limit external validation performance.
Background:
Many large-scale cardiovascular clinical trials are plagued with escalating costs and low enrollment. Implementing a computable phenotype, which is a set of executable algorithms, to identify a group of clinical characteristics derivable from electronic health records or administrative claims records, is essential to successful recruitment in large-scale pragmatic clinical trials. This methods paper provides an overview of the development and implementation of a computable phenotype in ADAPTABLE (Aspirin Dosing: a Patient-Centric Trial Assessing Benefits and Long-Term Effectiveness)—a pragmatic, randomized, open-label clinical trial testing the optimal dose of aspirin for secondary prevention of atherosclerotic cardiovascular disease events.
Methods and Results:
A multidisciplinary team developed and tested the computable phenotype to identify adults ≥18 years of age with a history of atherosclerotic cardiovascular disease without safety concerns around using aspirin and meeting trial eligibility criteria. Using the computable phenotype, investigators identified over 650 000 potentially eligible patients from the 40 participating sites from Patient-Centered Outcomes Research Network—a network of Clinical Data Research Networks, Patient-Powered Research Networks, and Health Plan Research Networks. Leveraging diverse recruitment methods, sites enrolled 15 076 participants from April 2016 to June 2019. During the process of developing and implementing the ADAPTABLE computable phenotype, several key lessons were learned. The accuracy and utility of a computable phenotype are dependent on the quality of the source data, which can be variable even with a common data model. Local validation and modification were required based on site factors, such as recruitment strategies, data quality, and local coding patterns. Sustained collaboration among a diverse team of researchers is needed during computable phenotype development and implementation.
Conclusions:
The ADAPTABLE computable phenotype served as an efficient method to recruit patients in a multisite pragmatic clinical trial. This process of development and implementation will be informative for future large-scale, pragmatic clinical trials.
Registration:
URL:
https://www.clinicaltrials.gov
; Unique identifier: NCT02697916.
Background
Social risk factors influence rehospitalization rates yet are challenging to incorporate into prediction models. Integration of social risk factors using natural language processing (NLP) and machine learning could improve risk prediction of 30‐day readmission following an acute myocardial infarction.
Methods and Results
Patients were enrolled into derivation and validation cohorts. The derivation cohort included inpatient discharges from Vanderbilt University Medical Center between January 1, 2007, and December 31, 2016, with a primary diagnosis of acute myocardial infarction, who were discharged alive, and not transferred from another facility. The validation cohort included patients from Dartmouth‐Hitchcock Health Center between April 2, 2011, and December 31, 2016, meeting the same eligibility criteria described above. Data from both sites were linked to Centers for Medicare & Medicaid Services administrative data to supplement 30‐day hospital readmissions. Clinical notes from each cohort were extracted, and an NLP model was deployed, counting mentions of 7 social risk factors. Five machine learning models were run using clinical and NLP‐derived variables. Model discrimination and calibration were assessed, and receiver operating characteristic comparison analyses were performed. The 30‐day rehospitalization rates among the derivation (n=6165) and validation (n=4024) cohorts were 15.1% (n=934) and 10.2% (n=412), respectively. The derivation models demonstrated no statistical improvement in model performance with the addition of the selected NLP‐derived social risk factors.
Conclusions
Social risk factors extracted using NLP did not significantly improve 30‐day readmission prediction among hospitalized patients with acute myocardial infarction. Alternative methods are needed to capture social risk factors.
Background
Super-utilizers represent approximately 5% of the population in the United States (U.S.) and yet they are responsible for over 50% of healthcare expenditures. Using characteristics of hospital service areas (HSAs) to predict utilization of resource intensive healthcare (RIHC) may offer a novel and actionable tool for identifying super-utilizer segments in the population. Consumer expenditures may offer additional value in predicting RIHC beyond typical population characteristics alone.
Methods
Cross-sectional data from 2017 was extracted from 5 unique sources. The outcome was RIHC and included emergency room (ER) visits, inpatient days, and hospital expenditures, all expressed as log per capita. Candidate predictors from 4 broad groups were used, including demographics, adults and child health characteristics, community characteristics, and consumer expenditures. Candidate predictors were expressed as per capita or per capita percent and were aggregated from zip-codes to HSAs using weighed means. Machine learning approaches (Random Forrest, LASSO) selected important features from nearly 1,000 available candidate predictors and used them to generate 4 distinct models, including non-regularized and LASSO regression, random forest, and gradient boosting. Candidate predictors from the best performing models, for each outcome, were used as independent variables in multiple linear regression models. Relative contribution of variables from each candidate predictor group to regression model fit were calculated.
Results
The median ER visits per capita was 0.482 [IQR:0.351–0.646], the median inpatient days per capita was 0.395 [IQR:0.214–0.806], and the median hospital expenditures per capita was $2,302 [1$,544.70-$3,469.80]. Using 1,106 variables, the test-set coefficient of determination (R2) from the best performing models ranged between 0.184–0.782. The adjusted R2 values from multiple linear regression models ranged from 0.311–0.8293. Relative contribution of consumer expenditures to model fit ranged from 23.4–33.6%.
Discussion
Machine learning models predicted RIHC among HSAs using diverse population data, including novel consumer expenditures and provides an innovative tool to predict population-based healthcare utilization and expenditures. Geographic variation in utilization and spending were identified.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.