Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2021
DOI: 10.18653/v1/2021.naacl-main.112
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Attentions for Long Document Summarization

Abstract: The quadratic computational and memory complexities of large Transformers have limited their scalability for long document summarization. In this paper, we propose HEPOS, a novel efficient encoder-decoder attention with head-wise positional strides to effectively pinpoint salient information from the source. We further conduct a systematic study of existing efficient self-attentions. Combined with HEPOS, we are able to process ten times more tokens than existing models that use full attentions. For evaluation,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
95
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 84 publications
(120 citation statements)
references
References 34 publications
1
95
0
Order By: Relevance
“…The companion heads serve a similar purpose but have a narrower focus on endorsed source tokens-frequently endorsed tokens are more likely to be copied over by companion heads. The method thus improves head diversity similar to that of sparse Transformers (Correia et al, 2019;Huang et al, 2021). The hyperparameter τ controls the level of endorsement.…”
Section: Companion Headsmentioning
confidence: 94%
“…The companion heads serve a similar purpose but have a narrower focus on endorsed source tokens-frequently endorsed tokens are more likely to be copied over by companion heads. The method thus improves head diversity similar to that of sparse Transformers (Correia et al, 2019;Huang et al, 2021). The hyperparameter τ controls the level of endorsement.…”
Section: Companion Headsmentioning
confidence: 94%
“…GovReport (Huang et al, 2021): A summarization dataset of reports addressing various national policy issues published by the Congressional Research Service 2 and the U.S. Government Accountability Office, 3 where each document is paired with an expert-written executive summary. The reports and their summaries are longer than their equivalents in other popular long-document summarization datasets; for example, GovReport's documents are approximately 1.5 and 2.5 times longer than the documents in Arxiv and PubMed (Cohan et al, 2018), respectively.…”
Section: Datasetsmentioning
confidence: 99%
“…Voita et al (2019) observed that heads are redundant, and Clark et al (2019) found that a head in BERT rarely attends to several consecutive tokens. Based on these, Huang et al (2021) applies a stride pattern in the encoder-decoder attention, reducing its cost by a factor of the stride size, and this method is likely complementary to our work.…”
Section: Related Workmentioning
confidence: 99%