Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences

Li, Yikuan; Wehbe, Ramsey M.; Ahmad, Faraz; Wang, Hanyin; Luo, Yunbo

doi:10.48550/arxiv.2201.11838

Cited by 17 publications

(24 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This strategy allows us to make better use of the available data since no data is discarded. In future work, these limitations can also be addressed by using a sliding window approach [27] or using another transformer-based architecture, such as Longformer [28], that is able to process longer text sequences.…”

Section: Discussionmentioning

confidence: 99%

Leveraging Clinical BERT in Multimodal Mortality Prediction Models for COVID-19

Pawar

Henriksson

Hedberg

et al. 2022

2022 IEEE 35th International Symposium on Computer-Based Medical Systems (CBMS)

View full text Add to dashboard Cite

Clinical prediction models are often based solely on the use of structured data in electronic health records, e.g. vital parameters and laboratory results, effectively ignoring potentially valuable information recorded in other modalities, such as freetext clinical notes. Here, we report on the development of a multimodal model that combines structured and unstructured data. In particular, we study how best to make use of a clinical language model in a multimodal setup for predicting 30-day all-cause mortality upon hospital admission in patients with COVID-19. We evaluate three strategies for incorporating a domain-specific clinical BERT model in multimodal prediction systems: (i) without fine-tuning, (ii) with unimodal fine-tuning, and (iii) with multimodal fine-tuning. The best-performing model leverages multimodal fine-tuning, in which the clinical BERT model is updated based also on the structured data. This multimodal mortality prediction model is shown to outperform unimodal models that are based on using either only structured data or only unstructured data. The experimental results indicate that clinical prediction models can be improved by including data in other modalities and that multimodal fine-tuning of a clinical language model is an effective strategy for incorporating information from clinical notes in multimodal prediction systems.

show abstract

Section: Discussionmentioning

confidence: 99%

Leveraging Clinical BERT in Multimodal Mortality Prediction Models for COVID-19

Pawar

Henriksson

Hedberg

et al. 2022

2022 IEEE 35th International Symposium on Computer-Based Medical Systems (CBMS)

View full text Add to dashboard Cite

show abstract

“…In addition, because it is trained with general domain text such as that from Wikipedia, it is hard to cover clinical expertise contained in the discharge summary (Lee et al 2020). Instead of BERT, we used the Clinical-Longformer model (Li et al 2022), which is trained with biomedical text and can embed long text up to 4096 tokens, with a bi-LSTM layer as the text encoder.…”

Section: Encoding Part Of Paatmentioning

confidence: 99%

“…• Our encoder, composed of Clinical-Longformer (Li et al 2022) and a bidirectional long short-term memory (bi-LSTM) layer, can be applied to long texts exceeding the maximum allowable input length of Longformer-based models, i.e., 4096 tokens, by segmentally encoding the input text.…”

Section: Introductionmentioning

confidence: 99%

An Automatic ICD Coding Network Using Partition-Based Label Attention

Kim¹,

Yoo²,

Kim³

2022

Preprint

View full text Add to dashboard Cite

International Classification of Diseases (ICD) is a global medical classification system which provides unique codes for diagnoses and procedures appropriate to a patient's clinical record. However, manual coding by human coders is expensive and error-prone. Automatic ICD coding has the potential to solve this problem. With the advancement of deep learning technologies, many deep learning-based methods for automatic ICD coding are being developed. In particular, a label attention mechanism is effective for multi-label classification, i.e., the ICD coding. It effectively obtains the labelspecific representations from the input clinical records. However, because the existing label attention mechanism finds key tokens in the entire text at once, the important information dispersed in each paragraph may be omitted from the attention map. To overcome this, we propose a novel neural network architecture composed of two parts of encoders and two kinds of label attention layers. The input text is segmentally encoded in the former encoder and integrated by the follower. Then, the conventional and partition-based label attention mechanisms extract important global and local feature representations. Our classifier effectively integrates them to enhance the ICD coding performance. We verified the proposed method using the MIMIC-III, a benchmark dataset of the ICD coding. Our results show that our network improves the ICD coding performance based on the partition-based mechanism.

show abstract

“…(For the sake of brevity, we refer readers to the original papers for more details.) In the biomedical area, Li et al [19] have proposed two models based on BigBird [10] and LongFormer [11] architectures. However, their research did not consider the paramount importance of using a tailored tokenizer for a specific domain with the default vocabulary from the source model.…”

Section: B Modeling Long Sequencesmentioning

confidence: 99%