The heart evolved hundreds of millions of years ago. During mammalian evolution, the cardiovascular system developed with complete separation between pulmonary and systemic circulations incorporated into a single pump with chambers dedicated to each circulation. A lower pressure right heart chamber supplies deoxygenated blood to the lungs, while a high pressure left heart chamber supplies oxygenated blood to the rest of the body. Due to the complexity of morphogenic cardiac looping and septation required to form these two chambers, congenital heart diseases often involve maldevelopment of the evolutionarily recent right heart chamber. Additionally, some diseases predominantly affect structures of the right heart, including arrhythmogenic right ventricular cardiomyopathy (ARVC) and pulmonary hypertension. To gain insight into right heart structure and function, we fine-tuned deep learning models to recognize the right atrium, the right ventricle, and the pulmonary artery, and then used those models to measure right heart structures in over 40,000 individuals from the UK Biobank with magnetic resonance imaging. We found associations between these measurements and clinical disease including pulmonary hypertension and dilated cardiomyopathy. We then conducted genome-wide association studies, identifying 104 distinct loci associated with at least one right heart measurement. Several of these loci were found near genes previously linked with congenital heart disease, such as NKX2-5, TBX3, WNT9B, and GATA4. We also observed interesting commonalities and differences in association patterns at genetic loci linked with both right and left ventricular measurements. Finally, we found that a polygenic predictor of right ventricular end systolic volume was associated with incident dilated cardiomyopathy (HR 1.28 per standard deviation; P = 2.4E-10), and remained a significant predictor of disease even after accounting for a left ventricular polygenic score. Harnessing deep learning to perform large-scale cardiac phenotyping, our results yield insights into the genetic and clinical determinants of right heart structure and function.
Background: Electronic health records (EHRs) promise to enable broad-ranging discovery with power exceeding that of conventional research cohort studies. However, research using EHR datasets may be subject to selection bias, which can be compounded by missing data, limiting the generalizability of derived insights. Methods: Mass General Brigham (MGB) is a large New England-based healthcare network comprising seven tertiary care and community hospitals with associated outpatient practices. Within an MGB-based EHR warehouse of >3.5 million individuals with at least one ambulatory care visit, we approximated a community-based cohort study by selectively sampling individuals longitudinally attending primary care practices between 2001-2018 (n=520,868), which we named the Community Care Cohort Project (C3PO). We also utilized pre-trained deep natural language processing (NLP) models to recover vital signs (i.e., height, weight, and blood pressure) from unstructured notes in the EHR. We assessed the validity of C3PO by deploying established risk models including the Pooled Cohort Equations (PCE) and the Cohorts for Aging and Genomic Epidemiology Atrial Fibrillation (CHARGE-AF) score, and compared model performance in C3PO to that observed within typical EHR Convenience Samples which included all individuals from the same parent EHR with sufficient data to calculate each score but without a requirement for longitudinal primary care. All analyses were facilitated by the JEDI Extractive Data Infrastructure pipeline which we designed to efficiently aggregate EHR data within a unified framework conducive to regular updates. Results: C3PO includes 520,868 individuals (mean age 48 years, 61% women, median follow-up 7.2 years, median primary care visits per individual 13). Estimated using reports, C3PO contains over 2.9 million electrocardiograms, 450,000 echocardiograms, 12,000 cardiac magnetic resonance images, and 75 million narrative notes. Using tabular data alone, 286,009 individuals (54.9%) had all vital signs available at baseline, which increased to 358,411 (68.8%) after NLP recovery (31% reduction in missingness). Among individuals with both NLP and tabular data available, NLP-extracted and tabular vital signs obtained on the same day were highly correlated (e.g., Pearson r range 0.95-0.99, p<0.01 for all). Both the PCE models (c-index range 0.724-0.770) and CHARGE-AF (c-index 0.782, 95% 0.777-0.787) demonstrated good discrimination. As compared to the Convenience Samples, AF and MI/stroke incidence rates in C3PO were lower and calibration error was smaller for both PCE (integrated calibration index range 0.012-0.030 vs. 0.028-0.046) and CHARGE-AF (0.028 vs. 0.036). Conclusions: Intentional sampling of individuals receiving regular ambulatory care and use of NLP to recover missing data have the potential to reduce bias in EHR research and maximize generalizability of insights.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.