Enhancing Clinical BERT Embedding using a Biomedical Knowledge Base

Hao, Boran; Zhu, Henghui; Paschalidis, Ioannis Ch.

doi:10.18653/v1/2020.coling-main.57

Cited by 38 publications

(24 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Which LM model? Several published works have found ClinicalBERT to outperform the other considered biomedical LMs on biomedical NLP tasks (Alsentzer et al, 2019;Kearns et al, 2019;Hao et al, 2020). In our results, however, SciBERT achieves the most consistent performance, clearly outperforming ClinicalBERT on the Procedures → Disease and Test → Disease categories, while performing similar to ClinicalBERT on the remaining categories.…”

Section: Discussionsupporting

confidence: 46%

See 1 more Smart Citation

Probing Pre-Trained Language Models for Disease Knowledge

Alghanmi¹,

Espinosa-Anke²,

Schockaert³

2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

Pre-trained language models such as Clini-calBERT have achieved impressive results on tasks such as medical Natural Language Inference. At first glance, this may suggest that these models are able to perform medical reasoning tasks, such as mapping symptoms to diseases. However, we find that standard benchmarks such as MedNLI contain relatively few examples that require such forms of reasoning. To better understand the medical reasoning capabilities of existing language models, in this paper we introduce DisKnE, a new benchmark for Disease Knowledge Evaluation. To construct this benchmark, we annotated each positive MedNLI example with the types of medical reasoning that are needed. We then created negative examples by corrupting these positive examples in an adversarial way. Furthermore, we define training-test splits per disease, ensuring that no knowledge about test diseases can be learned from the training data, and we canonicalize the formulation of the hypotheses to avoid the presence of artefacts. This leads to a number of binary classification problems, one for each type of reasoning and each disease. When analysing pre-trained models for the clinical/biomedical domain on the proposed benchmark, we find that their performance drops considerably.

show abstract

Section: Discussionsupporting

confidence: 46%

“…They obtained the best results with a BERT model that was pre-trained on PubMed abstracts and MIMIC-III clinical notes. Another large-scale evaluation of biomedical LMs has been carried out by Lewis et al (2020) (Michalopoulos et al, 2020;Hao et al, 2020).…”

Section: Related Work and Backgroundmentioning

confidence: 99%

Probing Pre-Trained Language Models for Disease Knowledge

Alghanmi¹,

Espinosa-Anke²,

Schockaert³

2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

show abstract

“…OWL2Vec*, in collaboration with ZB MED -Information Centre for Life Sciences, also aims at being applied to identify clusters in an ontology and assign these clusters as topics (i.e., a set of ontology classes) to a corpus of documents to enhance the results of an information retrieval task (Ritchie et al, 2021). In addition, OWL2Vec*, as an ontology tailored word embedding model, could replace the original word embedding models to increase performance in some domain specific tasks such as biomedical text analysis (Hao et al, 2020). This is also a promising direction worth studying.…”

Section: Discussion and Outlookmentioning

confidence: 99%

OWL2Vec*: embedding of OWL ontologies

et al. 2021

View full text Add to dashboard Cite

Semantic embedding of knowledge graphs has been widely studied and used for prediction and statistical analysis tasks across various domains such as Natural Language Processing and the Semantic Web. However, less attention has been paid to developing robust methods for embedding OWL (Web Ontology Language) ontologies, which contain richer semantic information than plain knowledge graphs, and have been widely adopted in domains such as bioinformatics. In this paper, we propose a random walk and word embedding based ontology embedding method named , which encodes the semantics of an OWL ontology by taking into account its graph structure, lexical information and logical constructors. Our empirical evaluation with three real world datasets suggests that benefits from these three different aspects of an ontology in class membership prediction and class subsumption prediction tasks. Furthermore, often significantly outperforms the state-of-the-art methods in our experiments.

show abstract

“…Jointly optimizing the two objectives can implicitly integrate knowledge from external knowledge graphs into language models. Here we adopt the pre-trained Clinical KB-BERT (Hao et al, 2020) in our analysis. ClinicalBERT-EE-KB-MLM: In this method, we pre-train BERT with UMLS information with only the masked language model (MLM) objective.…”

Section: Clinicalbert-ee-kgementioning

confidence: 99%

Incorporating medical knowledge in BERT for clinical relation extraction

Roy¹,

Pan²

2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

In recent years pre-trained language models (PLM) such as BERT have proven to be very effective in diverse NLP tasks such as Information Extraction, Sentiment Analysis and Question/Answering. Trained with massive generaldomain text, these pre-trained language models capture rich syntactic, semantic and discourse information in the text. However, due to the differences between general and specific domain text (e.g., Wikipedia text versus clinic notes), these models may not be ideal for domain-specific tasks (e.g., extracting clinical relations). Furthermore, it may require additional medical knowledge to understand clinical text properly. To solve these issues, in this research, we conduct a comprehensive examination of different techniques to add medical knowledge into a pre-trained BERT model for clinical relation extraction. Our best model outperformed the state-of-the-art systems on the benchmark i2b2/VA 2010 clinical relation extraction dataset.

show abstract

Enhancing Clinical BERT Embedding using a Biomedical Knowledge Base

Cited by 38 publications

References 14 publications

Probing Pre-Trained Language Models for Disease Knowledge

Probing Pre-Trained Language Models for Disease Knowledge

OWL2Vec*: embedding of OWL ontologies

Incorporating medical knowledge in BERT for clinical relation extraction

Contact Info

Product

Resources

About