2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2019
DOI: 10.1109/asru46091.2019.9003958
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchical Transformers for Long Document Classification

Abstract: BERT, which stands for Bidirectional Encoder Representations from Transformers, is a recently introduced language representation model based upon the transfer learning paradigm. We extend its fine-tuning procedure to address one of its major limitations -applicability to inputs longer than a few hundred words, such as transcripts of human call conversations. Our method is conceptually simple. We segment the input into smaller chunks and feed each of them into the base model. Then, we propagate each output thro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
115
0
2

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 152 publications
(117 citation statements)
references
References 18 publications
0
115
0
2
Order By: Relevance
“…Our work extends the latter line of work by proposing a hierarchical Transformer based on the recent pre-trained BERT for this task. Moreover, we notice that our BERT-based hierarchical Transformer is similar to the model proposed in (Pappagari et al, 2019), but we want to point out that our model design in the input and output layers is specific to stance classification, which is different from their work. Rumor Verification: Due to the negative impact of various rumors spreading on social media, rumor verification has attracted increasing attention in recent years.…”
Section: Related Workmentioning
confidence: 99%
“…Our work extends the latter line of work by proposing a hierarchical Transformer based on the recent pre-trained BERT for this task. Moreover, we notice that our BERT-based hierarchical Transformer is similar to the model proposed in (Pappagari et al, 2019), but we want to point out that our model design in the input and output layers is specific to stance classification, which is different from their work. Rumor Verification: Due to the negative impact of various rumors spreading on social media, rumor verification has attracted increasing attention in recent years.…”
Section: Related Workmentioning
confidence: 99%
“…This novel text representation technique showed improvement on five different datasets. Pappagari et al [33] proposed a modification to the BERT model for long document classification in a monolingual setting. They utilized a segmentation approach to divide the input text sequences into several subsequences.…”
Section: Related Workmentioning
confidence: 99%
“…We extracted the [CLS] vector representations from the last layer and combined them into a final document representation. This approach is inspired by the work of Pappagari et al [33]. The main difference of our study is in the way the subsequence representations are merged into a document representation.…”
Section: Using Sequences From Every Part Of the Documentmentioning
confidence: 99%
See 1 more Smart Citation
“…The neural network reads the text linearly while using an external memory to store relations, and can then use the global context to classify them. Other alternatives could be to still leverage the power of pre-trained transformer models, by using solutions to pass entire, long texts to the model instead (Pappagari et al, 2019). A deep learning approach is potentially not the most appropriate to represent complex temporal relations, for example Li et al (2020) recently reported good results using an ontology.…”
Section: Classification Approachesmentioning
confidence: 99%