Key Points
Question
Can machine learning deployed in electronic health records be used to improve readmission risk estimation for patients following acute myocardial infarction?
Findings
In this cohort study examining externally validated machine learning risk models for 30-day readmission of 10 187 patients following hospitalization for acute myocardial infarction, good discrimination performance was noted at the development site, but the best discrimination did not result in the best calibration. External validation yielded significant declines in discrimination and calibration.
Meaning
The findings of this study highlight that robust calibration assessments are a necessary complement to discrimination when machine learning models are used to predict post–acute myocardial infarction readmission; challenges with data availability across sites, even in the presence of a common data model, limit external validation performance.
Background: Primary nephrotic syndromes are rare diseases which impedes adequate sample size for observational patient-oriented research and clinical trial enrollment. A computable phenotype may be powerful in identifying patients with these diseases for research across multiple institutions. Methods: A comprehensive algorithm of inclusion and exclusion ICD-9 and ICD-10 codes to identify patients with primary nephrotic syndrome was developed. The algorithm was executed against the PCORnet® CDM at 3 institutions from Jan 1, 2009 to Jan 1, 2018, where a random selection of 50 cases and 50 non-cases (individuals not meeting case criteria seen within the same calendar year and within five years of age of a case) were reviewed by a nephrologist, for a total of 150 cases and 150 non-cases reviewed. The classification accuracy (sensitivity, specificity, positive and negative predictive value, F1 score) of the computable phenotype was determined. Results: The algorithm identified a total of 2,708 patients with nephrotic syndrome from 4,305,092 distinct patients in the CDM at all sites from 2009-2018. For all sites, the sensitivity, specificity, and area under the curve of the algorithm were 99% (95% CI: 97-99%), 79% (95% CI: 74-85%), and 0.9 (0.84-0.97), respectively. The most common causes of false positive classification were secondary FSGS (9/39) and lupus nephritis (9/39). Conclusion: This computable phenotype had good classification in identifying both children and adults with primary nephrotic syndrome utilizing only ICD-9 and ICD-10 codes, which are available across institutions in the United States. This may facilitate future screening and enrollment for research studies and enable comparative effectiveness research. Further refinements to the algorithm including use of laboratory data or addition of natural language processing may help better distinguish primary and secondary causes of nephrotic syndrome.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.