Background:
Prior risk models in patients with heart failure (HF) have focused on hospitalizations for worsening HF (WHF) and have not evaluated for differences in predictors by left ventricular ejection fraction (LVEF). We used natural language processing (NLP) and machine learning methods with access to longitudinal electronic health record (EHR) data to develop risk prediction models for WHF events across practice settings and by LVEF category.
Methods:
We identified all adults with HF and known LVEF on January 1
st
of each year from 2011-2019 in an integrated health care system. WHF events within 1 year were defined as any hospitalization, emergency department, or outpatient encounter with ≥1 symptom, ≥2 objective findings including ≥1 sign, and ≥1 change in HF-related therapy. Signs and symptoms were ascertained using rule-based NLP. We conducted boosted decision tree-based ensemble models for any WHF event within each LVEF category: HF with reduced EF (HFrEF; LVEF ≤40%), HF with mildly reduced EF (HFmrEF; LVEF 41-49%), and HF with preserved EF (HFpEF; LVEF ≥50%). We evaluated model discrimination using area under the curve (AUC) and model calibration using Brier scores.
Results:
Among 359,298 patients from 2011-2019, 65,838 (18%) had HFrEF, 52,491 (15%) had HFmrEF, and 240,969 (67%) had HFpEF. Mean age was 75±12, 47% were women, and 37% were minorities including 10% Black, 11% Asian/Pacific Islander, and 12% of Hispanic ethnicity. WHF events occurred in 22% of patients with HFrEF, 17% with HFmrEF, and 16% with HFpEF. The models displayed an AUC of 0.75 and Brier score of 0.15 for HFrEF and an AUC of 0.77 and Brier scores of 0.12 for both HFmrEF and HFpEF. Clinical predictors were similar across LVEF categories (
Table
).
Conclusions:
Longitudinal EHR data can be leveraged using NLP and machine learning for accurate risk estimation that reliably identifies clinical predictors across a range of LVEF. These findings may provide novel insight into the natural history of HF.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.