Are we there yet? Exploring clinical domain knowledge of BERT models

Sushil, Madhumita; Šuster, Simon; Daelemans, Walter

doi:10.18653/v1/2021.bionlp-1.5

Cited by 12 publications

(6 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Overall, although these models show benefits in their respective domains, they did not incorporate clinical knowledge to address challenges in clinical applications. 23 In practice, NR for clinical context has the following challenges. First, there can be accumulation of multiple numeric examples in a condensed context, such as "Physical examination: temperature 97.5, blood pressure 124/55, pulse 79, respirations 18, O 2 saturation 99% on room air."…”

Section: Impact Statementmentioning

confidence: 99%

Phenotyping in clinical text with unsupervised numerical reasoning for patient stratification

Tanwar¹,

Zhang²,

Ive³

et al. 2022

Exp Biol Med (Maywood)

View full text Add to dashboard Cite

Phenotypic information of patients, as expressed in clinical text, is important in many clinical applications such as identifying patients at risk of hard-to-diagnose conditions. Extracting and inferring some phenotypes from clinical text requires numerical reasoning, for example, a temperature of 102°F suggests the phenotype Fever. However, while current state-of-the-art phenotyping models using natural language processing (NLP) are in general very efficient in extracting phenotypes, they struggle to extract phenotypes that require numerical reasoning. In this article, we propose a novel unsupervised method that leverages external clinical knowledge and contextualized word embeddings by ClinicalBERT for numerical reasoning in different phenotypic contexts. Experiments show that the proposed method achieves significant improvement against unsupervised baseline methods with absolute increase in generalized Recall and F1 scores of up to 79% and 71%, respectively. Also, the proposed method outperforms supervised baseline methods with absolute increase in generalized Recall and F1 scores of up to 70% and 44%, respectively. In addition, we validate the methodology on clinical use cases where the detected phenotypes significantly contribute to patient stratification systems for a set of diseases, namely, HIV and myocardial infarction (heart attack). Moreover, we find that these phenotypes from clinical text can be used to impute the missing values in structured data, which enrich and improve data quality.

show abstract

Section: Impact Statementmentioning

confidence: 99%

Phenotyping in clinical text with unsupervised numerical reasoning for patient stratification

Tanwar¹,

Zhang²,

Ive³

et al. 2022

Exp Biol Med (Maywood)

View full text Add to dashboard Cite

show abstract

“…More generally, however, there is some evidence that the effectiveness of augmenting questions with textual knowledge is limited in the biomedical domain. For instance, Sushil et al [62] evaluated the effect of such augmentation strategies and failed to obtain any statistically significant improvements for MedNLI [55], a well-known benchmark for Natural Language Inference (NLI) in the biomedical domain. These findings were also corroborated by our own initial analysis.…”

Section: Related Workmentioning

confidence: 99%

“…When it comes to interpreting patient descriptions, however, the potential of such strategies is less clear. For instance, Sushil et al [62] used an information retrieval engine to find relevant sentences in biomedical corpora, which were then added to the premise of Natural Language Inference (NLI) instances. In experiments on MedNLI [55], they found no statistically significant improvements as a result Table 1: Example of a question from MedQA, along with the answer candidates.…”

Section: Introductionmentioning

confidence: 99%

Interpreting Patient Descriptions using Distantly Supervised Similar Case Retrieval

Alghanmi

Espinosa-Anke

Schockaert

2022

Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

Biomedical natural language processing often involves the interpretation of patient descriptions, for instance for diagnosis or for recommending treatments. Current methods, based on biomedical language models, have been found to struggle with such tasks. Moreover, retrieval augmented strategies have only had limited success, as it is rare to find sentences which express the exact type of knowledge that is needed for interpreting a given patient description. For this reason, rather than attempting to retrieve explicit medical knowledge, we instead propose to rely on a nearest neighbour strategy. First, we retrieve text passages that are similar to the given patient description, and are thus likely to describe patients in similar situations, while also mentioning some hypothesis (e.g. a possible diagnosis of the patient). We then judge the likelihood of the hypothesis based on the similarity of the retrieved passages. Identifying similar cases is challenging, however, as descriptions of similar patients may superficially look rather different, among others because they often contain an abundance of irrelevant details. To address this challenge, we propose a strategy that relies on a distantly supervised cross-encoder. Despite its conceptual simplicity, we find this strategy to be effective in practice. CCS CONCEPTS• Applied computing → Life and medical sciences; • Information systems → Information retrieval; • Computing methodologies → Natural language processing.

show abstract

“…Other works [8,12,28] designed special modules for numerical reasoning in text which were then integrated with neural networks. Overall, these models have shown advancements in the respective domains for specialized problems but they did not incorporate clinical knowledge with specific extensive reasoning for clinical applications [33].…”

Section: Related Workmentioning

confidence: 99%

Unsupervised Numerical Reasoning to Extract Phenotypes from Clinical Text by Leveraging External Knowledge

Tanwar¹,

Zhang²,

Ive³

et al. 2022

Preprint

View full text Add to dashboard Cite

Extracting phenotypes from clinical text has been shown to be useful for a variety of clinical use cases such as identifying patients with rare diseases. However, reasoning with numerical values remains challenging for phenotyping in clinical text, for example, temperature 102F representing Fever. Current state-of-the-art phenotyping models are able to detect general phenotypes, but perform poorly when they detect phenotypes requiring numerical reasoning. We present a novel unsupervised methodology leveraging external knowledge and contextualized word embeddings from ClinicalBERT for numerical reasoning in a variety of phenotypic contexts. Comparing against unsupervised benchmarks, it shows a substantial performance improvement with absolute gains on generalized Recall and F1 scores up to 79% and 71%, respectively. In the supervised setting, it also surpasses the performance of alternative approaches with absolute gains on generalized Recall and F1 scores up to 70% and 44%, respectively.

show abstract

Are we there yet? Exploring clinical domain knowledge of BERT models

Cited by 12 publications

References 44 publications

Phenotyping in clinical text with unsupervised numerical reasoning for patient stratification

Phenotyping in clinical text with unsupervised numerical reasoning for patient stratification

Interpreting Patient Descriptions using Distantly Supervised Similar Case Retrieval

Unsupervised Numerical Reasoning to Extract Phenotypes from Clinical Text by Leveraging External Knowledge

Contact Info

Product

Resources

About