Clinical Utility of the Automatic Phenotype Annotation in Unstructured Clinical Notes: ICU Use Cases

Zhang, Jingqing; Bolanos, Luis; Tanwar, Ashwani; Sokol, A.B.; Ive, Julia; Gupta, Vibhor; Guo, Yike

doi:10.48550/arxiv.2107.11665

Cited by 2 publications

(3 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The NR model performs significantly better than all of them achieving 69% recall and 59% F1 using exact metrics, while 79% recall and 71% F1 using generalized metrics. Precision is relatively lower as we focus on recall to extract more phenotypes, which is motivated by the preference that a model is sensitive to capture more phenotypic features of patients rather than missing ones for better accuracy in downstream clinical use cases [37]. Overall, the NR model shows huge gains which is useful in the absence of costly annotated data.…”

Section: Quantitative Analysismentioning

confidence: 99%

“…Extracting phenotypes from clinical text has been shown crucial for many clinical use cases [37] such as ICU in-hospital mortality prediction, remaining length of stay prediction, decompensation prediction and identifying patients with rare diseases. There are several challenges in extracting phenotypes such as handling a wide variety of phenotypic contexts, ambiguities, long term dependencies between phenotypes, and so on.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Unsupervised Numerical Reasoning to Extract Phenotypes from Clinical Text by Leveraging External Knowledge

Tanwar¹,

Zhang²,

Ive³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Extracting phenotypes from clinical text has been shown to be useful for a variety of clinical use cases such as identifying patients with rare diseases. However, reasoning with numerical values remains challenging for phenotyping in clinical text, for example, temperature 102F representing Fever. Current state-of-the-art phenotyping models are able to detect general phenotypes, but perform poorly when they detect phenotypes requiring numerical reasoning. We present a novel unsupervised methodology leveraging external knowledge and contextualized word embeddings from ClinicalBERT for numerical reasoning in a variety of phenotypic contexts. Comparing against unsupervised benchmarks, it shows a substantial performance improvement with absolute gains on generalized Recall and F1 scores up to 79% and 71%, respectively. In the supervised setting, it also surpasses the performance of alternative approaches with absolute gains on generalized Recall and F1 scores up to 70% and 44%, respectively.

show abstract

Section: Quantitative Analysismentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Unsupervised Numerical Reasoning to Extract Phenotypes from Clinical Text by Leveraging External Knowledge

Tanwar¹,

Zhang²,

Ive³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…The unstructured clinical notes, such as discharge summaries, nursing notes and radiology reports, are rich in phenotype information as the clinicians naturally describe phenotypic abnormalities of patients in the narratives of notes. Previous studies have demonstrated leveraging the phenotype information to improve the understanding of disease diagnosis, disease pathogenesis, patient outcomes and genomic diagnostics 24,[31][32][33][34][35] , and subsequently, the automatic phenotype annotation from clinical notes has become an important task in clinical Natural Language Processing (NLP).…”

Section: Pre-trained Context-aware Phenotyping Nlp Algorithmmentioning

confidence: 99%

A Scalable Workflow to Build Machine Learning Classifiers with Clinician-in-the-Loop to Identify Patients in Specific Diseases

Zhang¹,

Sharma²,

Bolaños³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Background: Clinicians and researchers may rely on medical coding systems such as International Classification of Diseases (ICD) to identify patients with specific diseases from Electronic Health Records (EHRs). However, due to the lack of detail and specificity as well as a probability of miscoding, recent studies suggest the ICD codes often cannot characterise patients accurately for specific diseases in real clinical practice, and as a result, using them to find patients for studies or trials can result in high failure rates and missing out on uncoded patients. Manual inspection of all patients at scale is not feasible as it is highly costly and slow. Methodology: This paper proposes a scalable workflow which leverages both structured data and unstructured textual notes from EHRs with techniques including Natural Language Processing (for phenotyping), AutoML and Clinician-in-the-Loop mechanism to build machine learning classifiers to identify patients at scale with given diseases, especially those who might currently be miscoded or missed by ICD codes. Results: Case studies in the MIMIC-III dataset were conducted where the proposed workflow demonstrates a higher classification performance in terms of F1 scores compared to simply using ICD codes on gold testing subset to identify patients with Ovarian Cancer (0.901 vs 0.814), Lung Cancer (0.859 vs 0.828), Cancer Cachexia (0.862 vs 0.650), and Lupus Nephritis (0.959 vs 0.855). Also, the proposed workflow that leverages unstructured notes consistently outperforms the baseline that uses structured data only with an increase of F1 (Ovarian Cancer 0.901 vs 0.719, Lung Cancer 0.859 vs 0.787, Cancer Cachexia 0.862 vs 0.838 and Lupus Nephritis 0.959 vs 0.785). Experiments on the large testing set also demonstrate the proposed workflow can find more patients who are miscoded or missed by ICD codes. Moreover, interpretability studies are also conducted to clinically validate the top impact features behind the decision-making of the classifiers. Conclusions:The proposed workflow can more accurately identify patients with specific diseases than simply using ICD codes. We also find the phenotypic features extracted from unstructured textual notes are beneficial for better accuracy and interpretability of classifiers. Moreover, the proposed workflow is scalable to other diseases and use cases as Clinician-in-the-Loop and AutoML enable rapid configuration of new machine learning classifiers.

show abstract

Clinical Utility of the Automatic Phenotype Annotation in Unstructured Clinical Notes: ICU Use Cases

Cited by 2 publications

References 19 publications

Unsupervised Numerical Reasoning to Extract Phenotypes from Clinical Text by Leveraging External Knowledge

Unsupervised Numerical Reasoning to Extract Phenotypes from Clinical Text by Leveraging External Knowledge

A Scalable Workflow to Build Machine Learning Classifiers with Clinician-in-the-Loop to Identify Patients in Specific Diseases

Contact Info

Product

Resources

About