Clinical concept extraction: A methodology review

Fu, Sunyang; Chen, David; He, Huan; Liu, Sijia; Moon, Sungrim; Peterson, Kevin J.; Shen, Feichen; Wang, Liwei; Wang, Yanshan; Wen, Andrew; Zhao, Yiqing; Sohn, Sunghwan; Liu, Hongfang

doi:10.1016/j.jbi.2020.103526

Cited by 111 publications

(96 citation statements)

References 138 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Rule-based approaches are common tools in scientific literature analysis and clinical text processing 41 . Our results suggest that engineering task-specific rules in addition to labels provided by ontologies provides strong performance for several NER tasks—in some cases approaching the performance of systems built using hand-labeled data.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Ontology-driven weak supervision for clinical entity classification in electronic health records

et al. 2021

View full text Add to dashboard Cite

In the electronic health record, using clinical notes to identify entities such as disorders and their temporality (e.g. the order of an event relative to a time index) can inform many important analyses. However, creating training data for clinical entity tasks is time consuming and sharing labeled data is challenging due to privacy concerns. The information needs of the COVID-19 pandemic highlight the need for agile methods of training machine learning models for clinical notes. We present Trove, a framework for weakly supervised entity classification using medical ontologies and expert-generated rules. Our approach, unlike hand-labeled notes, is easy to share and modify, while offering performance comparable to learning from manually labeled training data. In this work, we validate our framework on six benchmark tasks and demonstrate Trove’s ability to analyze the records of patients visiting the emergency department at Stanford Health Care for COVID-19 presenting symptoms and risk factors.

show abstract

Section: Discussionmentioning

confidence: 99%

“…Since these labeling functions are not easily automated and require hand coding, we refer to them as task-specific labeling functions. These are analogous to the rule-based approaches used in 48% of recent medical concept recognition publications 41 .…”

Section: Methodsmentioning

confidence: 99%

Ontology-driven weak supervision for clinical entity classification in electronic health records

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Hence, there are natural extensions to our traditional methodology including the switch to well-known neural network architectures at the level of concept recognition to generate RadLex mappings 26,67 . Recently, DL methods are increasingly used for concept recognition tasks such as long short-term memory (LSTM) and variants of bidirectional recurrent neural networks (BiRNN) coupled with conditional random field (CRF) architectures 68,69 . DL models can also be used to create task-specific classifiers in an end-to-end manner (e.g., convolutional neural (CNN) 24 , RNN 54 or LSTM networks 45,70 ).…”

Section: Discussionmentioning

confidence: 99%

Comparative analysis of machine learning algorithms for computer-assisted reporting based on fully automated cross-lingual RadLex mappings

Maros

Cho

Junge

et al. 2021

Sci Rep

View full text Add to dashboard Cite

Computer-assisted reporting (CAR) tools were suggested to improve radiology report quality by context-sensitively recommending key imaging biomarkers. However, studies evaluating machine learning (ML) algorithms on cross-lingual ontological (RadLex) mappings for developing embedded CAR algorithms are lacking. Therefore, we compared ML algorithms developed on human expert-annotated features against those developed on fully automated cross-lingual (German to English) RadLex mappings using 206 CT reports of suspected stroke. Target label was whether the Alberta Stroke Programme Early CT Score (ASPECTS) should have been provided (yes/no:154/52). We focused on probabilistic outputs of ML-algorithms including tree-based methods, elastic net, support vector machines (SVMs) and fastText (linear classifier), which were evaluated in the same 5 × fivefold nested cross-validation framework. This allowed for model stacking and classifier rankings. Performance was evaluated using calibration metrics (AUC, brier score, log loss) and -plots. Contextual ML-based assistance recommending ASPECTS was feasible. SVMs showed the highest accuracies both on human-extracted- (87%) and RadLex features (findings:82.5%; impressions:85.4%). FastText achieved the highest accuracy (89.3%) and AUC (92%) on impressions. Boosted trees fitted on findings had the best calibration profile. Our approach provides guidance for choosing ML classifiers for CAR tools in fully automated and language-agnostic fashion using bag-of-RadLex terms on limited expert-labelled training data.

show abstract

“…It is possible that additional performance gains may be achieved using more advanced models (e.g., deep learning) that have the advantage of partially automating the generation of features and feature interaction. [28][29][30][31] Finally, given that the VHA guidelines do not recommend screening ABI in asymptomatic patients, patients with undiagnosed or asymptomatic PAD would not be included as they are unlikely to undergo ABI testing.…”

Section: Discussionmentioning

confidence: 99%

Ankle and Toe Brachial Index Extraction from Clinical Reports For Peripheral Artery Disease Identification: Unlocking Clinical Data through Novel Methods

Friberg

Qazi

Boyle

et al. 2021

Preprint

View full text Add to dashboard Cite

Importance: Despite its high prevalence and poor outcomes, research on peripheral artery disease (PAD) remains limited due to the poor accuracy of billing codes for identifying PAD in health systems. Objective: Design a natural language processing (NLP) system that can extract ankle brachial index (ABI) and toe brachial index (TBI) values and evaluate the performance of extracted ABI/TBI values to identify patients with PAD in the Veterans Health Administration (VHA). Design, Setting, Participants: From a corpus of 392,244 ABI test reports at 94 VHA facilities during 2015-2017, we selected a random sample of 800 documents for NLP development. Using machine learning, we designed the NLP system to extract ABI and TBI values and laterality (right or left). Performance was optimized through sequential iterations of 10-fold cross validation and error analysis on 3 sets of 200 documents each, and tested on a final, independent set of 200 documents. Performance of NLP-extracted ABI and TBI values to identify PAD in a random sample of Veterans undergoing ABI testing was compared to structured chart review. Exposure: ABI <0.9, or TBI <0.7 in either right or left limb used to define PAD at the patient-level Main Outcome: Precision (or positive predictive value), recall (or sensitivity), F-1 measure (overall measure of accuracy, defined as harmonic mean of precision and recall) Results: The NLP system had an overall precision of 0.85, recall of 0.93 and F1-measure of 0.89. The F-1 measure was similar for both ABI and TBI (0.88 to 0.91). Recall was higher for ABI (0.95 to 0.97) while precision was higher for TBI (0.94 to 0.95). Among 261 patients with ABI testing (49% with PAD), the NLP system was able to extract ABI and TBI values in 238 (91.2%) patients. The NLP system had a positive predictive value of 92.3%, sensitivity of 89.3% and specificity of 92.3% to identify PAD. Conclusion: We have successfully developed and validated an NLP system to extract ABI and TBI values which can be used to accurately identify PAD within the VHA. Our findings have broad implications for PAD research and quality improvement efforts in large health systems.

show abstract

Clinical concept extraction: A methodology review

Cited by 111 publications

References 138 publications

Ontology-driven weak supervision for clinical entity classification in electronic health records

Ontology-driven weak supervision for clinical entity classification in electronic health records

Comparative analysis of machine learning algorithms for computer-assisted reporting based on fully automated cross-lingual RadLex mappings

Ankle and Toe Brachial Index Extraction from Clinical Reports For Peripheral Artery Disease Identification: Unlocking Clinical Data through Novel Methods

Contact Info

Product

Resources

About