Longitudinal EHR data, commonly available in clinical settings, can be useful for predicting future risk of suicidal behavior. This modeling approach could serve as an early warning system to help clinicians identify high-risk patients for further screening. By analyzing the full phenotypic breadth of the EHR, computerized risk screening approaches may enhance prediction beyond what is feasible for individual clinicians.
The Partners HealthCare Biobank is a Partners HealthCare enterprise-wide initiative whose goal is to provide a foundation for the next generation of translational research studies of genotype, environment, gene-environment interaction, biomarker and family history associations with disease phenotypes. The Biobank has leveraged in-person and electronic recruitment methods to enroll >30,000 subjects as of October 2015 at two academic medical centers in Partners HealthCare since launching in 2010. Through a close collaboration with the Partners Human Research Committee, the Biobank has developed a comprehensive informed consent process that addresses key patient concerns, including privacy and the return of research results. Lessons learned include the need for careful consideration of ethical issues, attention to the educational content of electronic media, the importance of patient authentication in electronic informed consent, the need for highly secure IT infrastructure and management of communications and the importance of flexible recruitment modalities and processes dependent on the clinical setting for recruitment.
Objective
To validate the use of electronic health records (EHRs) for the diagnosis of bipolar disorder (BD) and controls.
Methods
EHR data were obtained from a healthcare system of more than 4.2 million patients spanning more than 20 years. Chart review by experienced clinicians was used to identify text features and coded data consistent or inconsistent with a diagnosis of BD. Natural language processing (NLP) was used to train a diagnostic algorithm with 95% specificity for classifying BD. Filtered coded data were used to derive three additional classification rules for cases and one for controls. The positive predictive value (PPV) of EHR-based BD and subphenotype diagnoses was calculated against direct semi-structured interview diagnoses by trained clinicians blind to EHR diagnosis in a sample of 190 patients.
Results
The PPV of NLP-defined BD was 0.85. A coded classification based on strict filtering achieved a PPV of 0.79, but BD classifications based on less stringent criteria performed less well. None of the EHR-classified controls was given a diagnosis of BD on direct interview (PPV = 1.0). For most subphenotypes, PPVs exceeded 0.80. The EHR-based classifications were used to accrue 4500 BD cases and 5000 controls for genetic analyses.
Conclusions
Semi-automated mining of EHRs can be used to ascertain BD cases and controls with high specificity and predictive value compared to a gold-standard diagnostic interview. EHRs provide a powerful resource for high-throughput phenotyping for genetic and clinical research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.