Hierarchical Transformers for Long Document Classification

Pappagari, Raghavendra; Żelasko, Piotr; Villalba, Jesús; Carmiel, Yishay; Dehak, Najim

doi:10.1109/asru46091.2019.9003958

Cited by 152 publications

(117 citation statements)

References 18 publications

Supporting

Mentioning

115

Contrasting

Unclassified

Order By: Relevance

“…Our work extends the latter line of work by proposing a hierarchical Transformer based on the recent pre-trained BERT for this task. Moreover, we notice that our BERT-based hierarchical Transformer is similar to the model proposed in (Pappagari et al, 2019), but we want to point out that our model design in the input and output layers is specific to stance classification, which is different from their work. Rumor Verification: Due to the negative impact of various rumors spreading on social media, rumor verification has attracted increasing attention in recent years.…”

Section: Related Workmentioning

confidence: 99%

Coupled Hierarchical Transformer for Stance-Aware Rumor Verification in Social Media Conversations

Yu¹,

Jiang²,

Khoo³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

show abstract

Section: Related Workmentioning

confidence: 99%

Coupled Hierarchical Transformer for Stance-Aware Rumor Verification in Social Media Conversations

Yu¹,

Jiang²,

Khoo³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

show abstract

“…This novel text representation technique showed improvement on five different datasets. Pappagari et al [33] proposed a modification to the BERT model for long document classification in a monolingual setting. They utilized a segmentation approach to divide the input text sequences into several subsequences.…”

Section: Related Workmentioning

confidence: 99%

“…We extracted the [CLS] vector representations from the last layer and combined them into a final document representation. This approach is inspired by the work of Pappagari et al [33]. The main difference of our study is in the way the subsequence representations are merged into a document representation.…”

Section: Using Sequences From Every Part Of the Documentmentioning

confidence: 99%

“…The advantage of the first two mapping operations is that, in comparison to the methods proposed in Pappagari et al [33], they are more computationally efficient as we need to perform simple vector norm and averaging calculations to produce the final document representations. The third mapping operations uses a convolutional layer to map the different subsequences into one document representation.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Zero-Shot Learning for Cross-Lingual News Sentiment Classification

et al. 2020

View full text Add to dashboard Cite

In this paper, we address the task of zero-shot cross-lingual news sentiment classification. Given the annotated dataset of positive, neutral, and negative news in Slovene, the aim is to develop a news classification system that assigns the sentiment category not only to Slovene news, but to news in another language without any training data required. Our system is based on the multilingual BERTmodel, while we test different approaches for handling long documents and propose a novel technique for sentiment enrichment of the BERT model as an intermediate training step. With the proposed approach, we achieve state-of-the-art performance on the sentiment analysis task on Slovenian news. We evaluate the zero-shot cross-lingual capabilities of our system on a novel news sentiment test set in Croatian. The results show that the cross-lingual approach also largely outperforms the majority classifier, as well as all settings without sentiment enrichment in pre-training.

show abstract

“…The neural network reads the text linearly while using an external memory to store relations, and can then use the global context to classify them. Other alternatives could be to still leverage the power of pre-trained transformer models, by using solutions to pass entire, long texts to the model instead (Pappagari et al, 2019). A deep learning approach is potentially not the most appropriate to represent complex temporal relations, for example Li et al (2020) recently reported good results using an ontology.…”

Section: Classification Approachesmentioning

confidence: 99%

Relative and Incomplete Time Expression Anchoring for Clinical Text

Dupuis¹,

Bergou²,

Tissot³

et al. 2020

Proceedings of the 3rd Clinical Natural Language Processing Workshop

View full text Add to dashboard Cite

Extracting and modeling temporal information in clinical text is an important element for developing timelines and disease trajectories. Time information in written text varies in preciseness and explicitness, posing challenges for NLP approaches that aim to accurately anchor temporal information on a timeline. Relative and incomplete time expressions (RI-Timexes) are expressions that require additional information for their temporal anchor to be resolved, but few studies have addressed this challenge specifically. In this study, we aimed to reproduce and verify a classification approach for identifying anchor dates and relations in clinical text, and propose a novel relation classification approach for this task.

show abstract

Hierarchical Transformers for Long Document Classification

Cited by 152 publications

References 18 publications

Coupled Hierarchical Transformer for Stance-Aware Rumor Verification in Social Media Conversations

Coupled Hierarchical Transformer for Stance-Aware Rumor Verification in Social Media Conversations

Zero-Shot Learning for Cross-Lingual News Sentiment Classification

Relative and Incomplete Time Expression Anchoring for Clinical Text

Contact Info

Product

Resources

About