Enabling phenotypic big data with PheNorm

Yu, Sheng; Yumeng, Ma; Gronsbell, Jessica; Cai, Tianrun; Ananthakrishnan, Ashwin N.; Gainer, Vivian S.; Churchill, Susanne; Szolovits, Peter; Murphy, Shawn N.; Kohane, Isaac S.; Liao, Katherine P.; Cai, Tianxi

doi:10.1093/jamia/ocx111

Cited by 92 publications

(111 citation statements)

References 48 publications

Supporting

Mentioning

111

Contrasting

Order By: Relevance

“…Finally, the claims‐based outcome is likely imperfect and may be particularly so for earlier onset or less common forms of dementia; however, we note that dementia is typically underdiagnosed (which would bias the present study toward the null hypothesis) and prior formal validation has found positive predictive value of dementia codes to be >75% in most health systems . Further work is needed on highly scalable computed phenotypes as many existing approaches consume a full medical record and thus are poorly suited to time‐to‐event analysis . The NLP approach reported here could contribute to that effort.…”

Section: Discussionmentioning

confidence: 97%

See 1 more Smart Citation

Stratifying risk for dementia onset using large‐scale electronic health record data: A retrospective cohort study

McCoy

Han

Pellegrini

et al. 2020

Alzheimer's & Dementia

View full text Add to dashboard Cite

Introduction Preventing dementia, or modifying disease course, requires identification of presymptomatic or minimally symptomatic high‐risk individuals. Methods We used longitudinal electronic health records from two large academic medical centers and applied a validated natural language processing tool to estimate cognitive symptomatology. We used survival analysis to examine the association of cognitive symptoms with incident dementia diagnosis during up to 8 years of follow‐up. Results Among 267,855 hospitalized patients with 1,251,858 patient years of follow‐up data, 6516 (2.4%) received a new diagnosis of dementia. In competing risk regression, an increasing cognitive symptom score was associated with earlier dementia diagnosis (HR 1.63; 1.54–1.72). Similar results were observed in the second hospital system and in subgroup analysis of younger and older patients. Discussion A cognitive symptom measure identified in discharge notes facilitated stratification of risk for dementia up to 8 years before diagnosis.

show abstract

Section: Discussionmentioning

confidence: 97%

“…52,53 Further work is needed on highly scalable computed phenotypes as many existing approaches consume a full medical record and thus are poorly suited to time-toevent analysis. 54,55 The NLP approach reported here could contribute to that effort.…”

Section: Discussionmentioning

confidence: 99%

Stratifying risk for dementia onset using large‐scale electronic health record data: A retrospective cohort study

McCoy

Han

Pellegrini

et al. 2020

Alzheimer's & Dementia

View full text Add to dashboard Cite

show abstract

Section: Statistical Issues Related To Biobank Researchmentioning

confidence: 99%

“…Some challenges include dealing with misspellings, tenses, alternative phrasing, negation, and defining a trained dictionary of words and phrases that may correspond to a particular phenotype. Algorithms are usually trained using expert annotations, but new methods have attempted to automate this step as well . Additional machine learning methods have also been used to define phenotypes (eg, imaging analytics from medical imaging datasets) using a broad spectrum of patient information …”

Section: Statistical Issues Related To Biobank Researchmentioning

confidence: 99%

“…Unstructured data have also been used to define phenotypes, particularly for diseases with unreliable ICD9 classifications such as some psychiatric diseases, using natural language processing methods. [52][53][54][55][56][57][58][59][60] Such methods can also be used to obtain patient measures such as smoking status. 52 Natural language processing methods mine free text such as narrative doctor's notes for words or phrases to develop a model combining structured and unstructured data to classify each patient as having or not having the phenotype of interest.…”

Section: 13mentioning

confidence: 99%

See 1 more Smart Citation

The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities

Beesley

Salvatore

Fritsche

et al. 2019

Statistics in Medicine

View full text Add to dashboard Cite

Biobanks linked to electronic health records provide rich resources for health‐related research. With improvements in administrative and informatics infrastructure, the availability and utility of data from biobanks have dramatically increased. In this paper, we first aim to characterize the current landscape of available biobanks and to describe specific biobanks, including their place of origin, size, and data types. The development and accessibility of large‐scale biorepositories provide the opportunity to accelerate agnostic searches, expedite discoveries, and conduct hypothesis‐generating studies of disease‐treatment, disease‐exposure, and disease‐gene associations. Rather than designing and implementing a single study focused on a few targeted hypotheses, researchers can potentially use biobanks' existing resources to answer an expanded selection of exploratory questions as quickly as they can analyze them. However, there are many obvious and subtle challenges with the design and analysis of biobank‐based studies. Our second aim is to discuss statistical issues related to biobank research such as study design, sampling strategy, phenotype identification, and missing data. We focus our discussion on biobanks that are linked to electronic health records. Some of the analytic issues are illustrated using data from the Michigan Genomics Initiative and UK Biobank, two biobanks with two different recruitment mechanisms. We summarize the current body of literature for addressing these challenges and discuss some standing open problems. This work complements and extends recent reviews about biobank‐based research and serves as a resource catalog with analytical and practical guidance for statisticians, epidemiologists, and other medical researchers pursuing research using biobanks.

show abstract

Use of Medical Imaging to Advance Mental Health Care: Contributions from Neuroimaging Informatics

Gollub

Benson

2021

Health Informatics

View full text Add to dashboard Cite

Enabling phenotypic big data with PheNorm

Abstract: The accuracy of the PheNorm algorithms is on par with algorithms trained with annotated samples. PheNorm fully automates the generation of accurate phenotyping algorithms and demonstrates the capacity for EHR-driven annotations to scale to the next level - phenotypic big data.

Cited by 92 publications

References 48 publications

Stratifying risk for dementia onset using large‐scale electronic health record data: A retrospective cohort study

Stratifying risk for dementia onset using large‐scale electronic health record data: A retrospective cohort study

The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities

Use of Medical Imaging to Advance Mental Health Care: Contributions from Neuroimaging Informatics

Contact Info

Product

Resources

About