2018
DOI: 10.1093/jamia/ocy154
|View full text |Cite
|
Sign up to set email alerts
|

Automated and flexible identification of complex disease: building a model for systemic lupus erythematosus using noisy labeling

Abstract: Accurate and efficient identification of complex chronic conditions in the electronic health record (EHR) is an important but challenging task that has historically relied on tedious clinician review and oversimplification of the disease. Here we adapt methods that allow for automated “noisy labeling” of positive and negative controls to create a “silver standard” for machine learning to automate identification of systemic lupus erythematosus (SLE). Our final model, which includes both structured data as well … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
27
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
2
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 44 publications
(27 citation statements)
references
References 9 publications
0
27
0
Order By: Relevance
“…However, computational methods that allow more granular phenotype extraction from EHRs are advancing rapidly. For example, in a recent study, we applied an artificial intelligence algorithm to recognize patterns of lupus and assign probabilities of disease . Work is ongoing to scale such algorithms in repositories like the RISE Registry to understand the full spectrum of phenotypes across a population, to track outcomes, and to conduct discovery research.…”
Section: Generating Real‐world Evidencementioning
confidence: 99%
“…However, computational methods that allow more granular phenotype extraction from EHRs are advancing rapidly. For example, in a recent study, we applied an artificial intelligence algorithm to recognize patterns of lupus and assign probabilities of disease . Work is ongoing to scale such algorithms in repositories like the RISE Registry to understand the full spectrum of phenotypes across a population, to track outcomes, and to conduct discovery research.…”
Section: Generating Real‐world Evidencementioning
confidence: 99%
“…To address the scarcity of labeled training data, Chen et al used active learning to intelligently select training samples for labeling, demonstrating that classifier performance could be preserved with fewer samples [13]. Another trend is the use of "silver standard training sets", a semi-supervised approach where training samples are labeled using an imperfect heuristic rather than by manual review [14][15][16][17][18][19]. The intuition is that noise-tolerant classifiers trained on imperfectly labeled data will abstract higher order properties of the phenotype beyond the original labeling heuristic (so-called noise-tolerant learning [20]).…”
Section: Background and Significancementioning
confidence: 99%
“…12,13 Additionally, while case and control phenotyping using EHR data has also relied on a small number of expert curated cohorts, recent studies have demonstrated that ML approaches can expand upon and identify such cohorts using automated feature selection and imperfect case de nitions in a high-throughput manner. [14][15][16][17][18] Studies have also shown that case and control selection with diagnosis codes can signi cantly affect model performance, the hierarchical organization of structured medical data can be utilized for feature reduction and model performance improvement, and calibration is essential for understanding the clinical utility of a phenotyping model. [19][20][21][22] Stroke phenotyping algorithms have also used machine learning to enhance the classi cation performance of a diagnosis-code based AIS phenotyping algorithm.…”
Section: Introductionmentioning
confidence: 99%