ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission

Huang, Kexin; Altosaar, Jaan; Ranganath, Rajesh

doi:10.48550/arxiv.1904.05342

Cited by 217 publications

(221 citation statements)

References 36 publications

Supporting

Mentioning

186

Contrasting

Order By: Relevance

“…Similarly to [15,9], we leveraged approximately 2 million clinical notes extracted from the MIMIC-III [21] dataset, which is the largest publicly available EHR dataset that contains clinical narratives of over 40,000 patients admitted to the intensive care units. We only applied minimal pre-processing steps, including 1) to remove all de-identification placeholders that were generated to protect the PHI (protected health information); 2) to replace all characters other than alphanumericals and punctuation marks; 3) to convert all alphabetical characters to lower cases; and 4) to strip extra white spaces.…”

Section: Datasetsmentioning

confidence: 99%

See 1 more Smart Citation

Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences

Li¹,

Wehbe²,

Ahmad³

et al. 2022

Preprint

View full text Add to dashboard Cite

Transformers-based models, such as BERT, have dramatically improved the performance for various natural language processing tasks. The clinical knowledge enriched model, namely Clinical-BERT, also achieved state-of-the-art results when performed on clinical named entity recognition and natural language inference tasks. One of the core limitations of these transformers is the substantial memory consumption due to their full self-attention mechanism. To overcome this, long sequence transformer models, e.g. Longformer and BigBird, were proposed with the idea of sparse attention mechanism to reduce the memory usage from quadratic to the sequence length to a linear scale. These models extended the maximum input sequence length from 512 to 4096, which enhanced the ability of modeling long-term dependency and consequently achieved optimal results in a variety of tasks. Inspired by the success of these long sequence transformer models, we introduce two domain enriched language models, namely Clinical-Longformer and Clinical-BigBird, which are pre-trained from large-scale clinical corpora. We evaluate both pre-trained models using 10 baseline tasks including named entity recognition, question answering, and document classification tasks. The results demonstrate that Clinical-Longformer and Clinical-BigBird consistently and significantly outperform ClinicalBERT as well as other short-sequence transformers in all downstream tasks. We have made the pre-trained models available for public download at: https://huggingface.co/yikuan8/Clinical-Longformer.

show abstract

Section: Datasetsmentioning

confidence: 99%

“…In clinical NLP, the models that are applied with the transformer-based approaches also encounter this limitation [8]. For example, the discharge summaries in MIMIC-III, which are always used to predict hospital re-admission [9] or mortality [10], have 1,435 words in average, far exceeding the 512 tokens limits of BERT like models.…”

Section: Introductionmentioning

confidence: 99%

Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences

Li¹,

Wehbe²,

Ahmad³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The large amount of data generated in this process offers an opportunity for * Corresponding Author. deep learning technology to improve healthcare, such as diagnoses prediction (Choi et al, 2016), medication recommendation (Shang et al, 2019), mortality prediction (Tang et al, 2020), and readmission prediction (Huang et al, 2019). However, comparing to common academic datasets, such as ImageNet (Deng et al, 2009) and WMT (Macháček and Bojar, 2014), real-world EHR data is longitudinal, heterogeneous, and multimodal, which proposes big challenges to leverage the information included in it.…”

Section: Introductionmentioning

confidence: 99%

“…Due to the importance of clinical notes, it is necessary to combine them with other data sources for the integrity of clinical features. Since the common pre-trained language models such as BERT (Devlin et al, 2019) do not consider the specific complexity of clinical notes, we apply the ClinicalBERT (Huang et al, 2019) that are pre-trained on clinical notes for handling the notes data in this paper.…”

Section: Introductionmentioning

confidence: 99%

How to Leverage Multimodal EHR Data for Better Medical Predictions?

Yang¹,

Wu²

2021

Preprint

View full text Add to dashboard Cite

Healthcare is becoming a more and more important research topic recently. With the growing data in the healthcare domain, it offers a great opportunity for deep learning to improve the quality of medical service. However, the complexity of electronic health records (EHR) data is a challenge for the application of deep learning. Specifically, the data produced in the hospital admissions are monitored by the EHR system, which includes structured data like daily body temperature, and unstructured data like free text and laboratory measurements. Although there are some preprocessing frameworks proposed for specific EHR data, the clinical notes that contain significant clinical value are beyond the realm of their consideration. Besides, whether these different data from various views are all beneficial to the medical tasks and how to best utilize these data remain unclear. Therefore, in this paper, we first extract the accompanying clinical notes from EHR and propose a method to integrate these data, we also comprehensively study the different models and the data leverage methods for better medical task prediction. The results on two medical prediction tasks show that our fused model with different data outperforms the state-of-the-art method that without clinical notes, which illustrates the importance of our fusion method and the value of clinical note features. Our code is available at https: //github.com/emnlp-mimic/mimic.

show abstract

“…Recent years have witnessed the great success of pre-trained language models (PLMs), such as BERT [7], in a broad range of natural language processing (NLP) tasks. Moreover, several domainoriented PLMs have been proposed to adapt to specific domains [4,8,10]. For instance, BioBERT [13] and SciBERT [2] are pretrained leveraging large-scale domain-specific corpora for biomedical and scientific domain tasks respectively.…”

Section: Introductionmentioning

confidence: 99%

Domain-oriented Language Modeling with Adaptive Hybrid Masking and Optimal Transport Alignment

Zhang

Yuan

Liu

et al. 2021

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &Amp; Data Mining

View full text Add to dashboard Cite

Motivated by the success of pre-trained language models such as BERT in a broad range of natural language processing (NLP) tasks, recent research efforts have been made for adapting these models for different application domains. Along this line, existing domainoriented models have primarily followed the vanilla BERT architecture and have a straightforward use of the domain corpus. However, domain-oriented tasks usually require accurate understanding of domain phrases, and such fine-grained phrase-level knowledge is hard to be captured by existing pre-training scheme. Also, the word co-occurrences guided semantic learning of pre-training models can be largely augmented by entity-level association knowledge. But meanwhile, by doing so there is a risk of introducing noise due to the lack of groundtruth word-level alignment. To address the above issues, we provide a generalized domain-oriented approach, which leverages auxiliary domain knowledge to improve the existing pre-training framework from two aspects. First, to preserve phrase knowledge effectively, we build a domain phrase pool as auxiliary training tool, meanwhile we introduce Adaptive Hybrid Masked Model to incorporate such knowledge. It integrates two learning modes, word learning and phrase learning, and allows them to switch between each other. Second, we introduce Cross Entity Alignment to leverage entity association as weak supervision to augment the semantic learning of pre-trained models. To alleviate the potential noise in this process, we introduce an interpretable Optimal Transport based approach to guide alignment learning. Experiments on four domain-oriented tasks demonstrate the superiority of our framework. CCS CONCEPTS• Computing methodologies → Natural language processing.

show abstract

ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission

Cited by 217 publications

References 36 publications

Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences

Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences

How to Leverage Multimodal EHR Data for Better Medical Predictions?

Domain-oriented Language Modeling with Adaptive Hybrid Masking and Optimal Transport Alignment

Contact Info

Product

Resources

About