2019
DOI: 10.1093/jamia/ocz066
|View full text |Cite
|
Sign up to set email alerts
|

High-throughput multimodal automated phenotyping (MAP) with application to PheWAS

Abstract: Objective Electronic health records linked with biorepositories are a powerful platform for translational studies. A major bottleneck exists in the ability to phenotype patients accurately and efficiently. The objective of this study was to develop an automated high-throughput phenotyping method integrating International Classification of Diseases (ICD) codes and narrative data extracted using natural language processing (NLP). Materials and M… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
36
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
2

Relationship

3
5

Authors

Journals

citations
Cited by 71 publications
(37 citation statements)
references
References 35 publications
(41 reference statements)
1
36
0
Order By: Relevance
“…Another limitation was the use of ICD codes to assign diagnoses in the studied VA cystic kidney and liver cohort. Although ICD codes are commonly used for disease phenotyping based on electronic medical records, their accuracy is disease-dependent and often inferior to machine learning-based complex methods 15 . While it would be ideal to have diagnoses among these patients confirmed by genotyping, imaging, and family history data, such a depth of information will unlikely be fully retrievable from general medical records.…”
Section: Discussionmentioning
confidence: 99%
“…Another limitation was the use of ICD codes to assign diagnoses in the studied VA cystic kidney and liver cohort. Although ICD codes are commonly used for disease phenotyping based on electronic medical records, their accuracy is disease-dependent and often inferior to machine learning-based complex methods 15 . While it would be ideal to have diagnoses among these patients confirmed by genotyping, imaging, and family history data, such a depth of information will unlikely be fully retrievable from general medical records.…”
Section: Discussionmentioning
confidence: 99%
“…Phenotyping with PheMap has several advantages compared with previously reported approaches to high-throughput phenotyping. 8 , 9 , 43 Compared to ontological approaches to high-throughput phenotyping, 43 PheMap provides a way to quantify the importance of relationships between phenotypes and medical concepts through NLP and the TF-IDF statistic. In addition to diagnosis codes, PheMap incorporates other medical information into the phenotype score, including symptoms, medications, laboratory tests, and procedures, which has been shown to improve phenotyping.…”
Section: Discussionmentioning
confidence: 99%
“…Let , and . These two key features can be mapped automatically as in Liao et al [21] using existing knowledge sources including the PheWAS catalogue [12] and the Unified Medical Language System (UMLS). Additional candidate features – including counts of other ICD codes, NLP features, drug prescriptions, lab tests, and procedure codes – can be identified automatically without GLabels via existing methods such as the SAFE method.…”
Section: Methodsmentioning
confidence: 99%
“…not requring GLabels) computational phenotyping methods. [17, 18, 19, 20, 21, 22, 23, 24] The class of “weakly supervised” methods, which train supervised classifiers using noisy labels generated from key surrogate features in the data rather than expensive GLabels, has proven particularly powerful to this end. For instance, the “anchor and learn” approach trains a regularized logistic regression model on imperfect labels derived from ‘anchor’ features with high PPVs.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation