2021
DOI: 10.48550/arxiv.2104.08444
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Three-level Hierarchical Transformer Networks for Long-sequence and Multiple Clinical Documents Classification

Abstract: We present the Hierarchical Transformer Networks for modeling long-term dependencies across clinical notes for the purpose of patientlevel prediction. The network is equipped with three levels of Transformer-based encoders to learn progressively from words to sentences, sentences to notes, and finally notes to patients. The first level from word to sentence directly applies a pre-trained BERT model, and the second and third levels both implement a stack of 2-layer encoders before the final patient representati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 25 publications
0
3
0
Order By: Relevance
“…Bidirectional Encoder Representations from Transformers (BERT), is a state-of-the-art large language model that was pre-trained on millions of English documents and has been shown to achieve high performances in facilitating many downstream NLP tasks (termed ne-turning) including named entity recognition 16,17 and document classi cation 18,19 . By taking labelled referrals as input, we developed an end-to-end model by ne-tuning the pre-trained BERT-base-uncased model 3 for text classi cation.…”
Section: Bert End-to-end Modelmentioning
confidence: 99%
“…Bidirectional Encoder Representations from Transformers (BERT), is a state-of-the-art large language model that was pre-trained on millions of English documents and has been shown to achieve high performances in facilitating many downstream NLP tasks (termed ne-turning) including named entity recognition 16,17 and document classi cation 18,19 . By taking labelled referrals as input, we developed an end-to-end model by ne-tuning the pre-trained BERT-base-uncased model 3 for text classi cation.…”
Section: Bert End-to-end Modelmentioning
confidence: 99%
“…Gao et al (2021) [11] presents evidence showing BERT-based models under-perform in clinical text classification tasks with long input data, such as MIMIC-III [15], when compared to a CNN trained on word embeddings that can process the complete input sequences. Si and Roberts (2021) [24] presents an alternative system to overcome the issue of long documents, where transformer-based encoders are used to learn from words to sentences, sentences to notes and notes to patients progressively. This transformer-based hierarchical attention networks system presents SOTA methods for in-hospital mortality prediction and phenotype predictions using MIMIC-III.…”
Section: Related Workmentioning
confidence: 99%
“…This transformer-based hierarchical attention networks system presents SOTA methods for in-hospital mortality prediction and phenotype predictions using MIMIC-III. However, it requires considerable computational resources [24]. Chalkidis et al (2020) [6] proposes a similar hierarchical version using SCI-BERT to deal with long documents for predicting medical codes from MIMIC III.…”
Section: Related Workmentioning
confidence: 99%