The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2022
DOI: 10.48550/arxiv.2201.11838
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences

Abstract: Transformers-based models, such as BERT, have dramatically improved the performance for various natural language processing tasks. The clinical knowledge enriched model, namely Clinical-BERT, also achieved state-of-the-art results when performed on clinical named entity recognition and natural language inference tasks. One of the core limitations of these transformers is the substantial memory consumption due to their full self-attention mechanism. To overcome this, long sequence transformer models, e.g. Longf… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 17 publications
(24 citation statements)
references
References 31 publications
0
15
0
Order By: Relevance
“…This strategy allows us to make better use of the available data since no data is discarded. In future work, these limitations can also be addressed by using a sliding window approach [27] or using another transformer-based architecture, such as Longformer [28], that is able to process longer text sequences.…”
Section: Discussionmentioning
confidence: 99%
“…This strategy allows us to make better use of the available data since no data is discarded. In future work, these limitations can also be addressed by using a sliding window approach [27] or using another transformer-based architecture, such as Longformer [28], that is able to process longer text sequences.…”
Section: Discussionmentioning
confidence: 99%
“…In addition, because it is trained with general domain text such as that from Wikipedia, it is hard to cover clinical expertise contained in the discharge summary (Lee et al 2020). Instead of BERT, we used the Clinical-Longformer model (Li et al 2022), which is trained with biomedical text and can embed long text up to 4096 tokens, with a bi-LSTM layer as the text encoder.…”
Section: Encoding Part Of Paatmentioning
confidence: 99%
“…• Our encoder, composed of Clinical-Longformer (Li et al 2022) and a bidirectional long short-term memory (bi-LSTM) layer, can be applied to long texts exceeding the maximum allowable input length of Longformer-based models, i.e., 4096 tokens, by segmentally encoding the input text.…”
Section: Introductionmentioning
confidence: 99%
“…(For the sake of brevity, we refer readers to the original papers for more details.) In the biomedical area, Li et al [19] have proposed two models based on BigBird [10] and LongFormer [11] architectures. However, their research did not consider the paramount importance of using a tailored tokenizer for a specific domain with the default vocabulary from the source model.…”
Section: B Modeling Long Sequencesmentioning
confidence: 99%