Objectives
To define pregnancy episodes and estimate gestational age within electronic health record (EHR) data from the National COVID Cohort Collaborative (N3C).
Materials and Methods
We developed a comprehensive approach, named Hierarchy and rule-based pregnancy episode Inference integrated with Pregnancy Progression Signatures (HIPPS), and applied it to EHR data in the N3C (1/1/2018-4/7/2022). HIPPS combines: 1) an extension of a previously published pregnancy episode algorithm, 2) a novel algorithm to detect gestational age-specific signatures of a progressing pregnancy for further episode support, and 3) pregnancy start date inference. Clinicians performed validation of HIPPS on a subset of episodes. We then generated pregnancy cohorts based on gestational age precision and pregnancy outcomes for assessment of accuracy and comparison of COVID-19 and other characteristics.
Results
We identified 628,165 pregnant persons with 816,471 pregnancy episodes, of which 52.3% were live births, 24.4% were other outcomes (stillbirth, ectopic pregnancy, abortions), and 23.3% had unknown outcomes. Clinician validation agreed 98.8% with HIPPS-identified episodes. We were able to estimate start dates within one week of precision for 475,433 (58.2%) episodes. 62,540 (7.7%) episodes had incident COVID-19 during pregnancy.
Discussion
HIPPS provides measures of support for pregnancy-related variables such as gestational age and pregnancy outcomes based on N3C data. Gestational age precision allows researchers to find time to events with reasonable confidence.
Conclusion
We have developed a novel and robust approach for inferring pregnancy episodes and gestational age that addresses data inconsistency and missingness in EHR data.
Lay Summary
The National COVID Cohort Collaborative (N3C) provides researchers a unique opportunity to use electronic health record data from more than 12 million individuals from over seventy healthcare systems across the U.S. to study the impact of COVID-19 on pregnancy and women’s health. However, doing research with electronic health record data from different sources can be challenging as data can often be reported in many ways and formats. To address this challenge, we developed an approach known as Hierarchy and rule-based pregnancy episode Inference integrated with Pregnancy Progression Signatures (HIPPS) that can 1) find the start and end of a pregnancy, 2) infer whether the pregnancy resulted in a live birth or pregnancy loss, and 3) determine the gestational age at the end of pregnancy. We observed from a subset of data that our approach had high agreement with how clinicians would collect this information from electronic health records. When applying our approach on all the data in N3C, we identified 816K pregnancies from 628K individuals. Of these individuals, 62K had COVID-19 during pregnancy. Our research demonstrates that our HIPPS approach can enable COVID-19-related research in pregnancy with electronic health record data.