2021
DOI: 10.48550/arxiv.2104.02112
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Efficient Attentions for Long Document Summarization

Abstract: The quadratic computational and memory complexities of large Transformers have limited their scalability for long document summarization. In this paper, we propose HEPOS, a novel efficient encoder-decoder attention with head-wise positional strides to effectively pinpoint salient information from the source. We further conduct a systematic study of existing efficient self-attentions. Combined with HEPOS, we are able to process ten times more tokens than existing models that use full attentions. For evaluation,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
22
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 13 publications
(23 citation statements)
references
References 33 publications
0
22
0
Order By: Relevance
“…Although large pre-trained Transformers with efficient attention mechanism [20] have been proposed to abstractively summarize long documents, we argue that extractive summarization tends to be more faithful. Furthermore, because MemSum achieves stateof-the-art performance on various long document summarization tasks, MDP approaches will be promising design choices for further research.…”
Section: Related Workmentioning
confidence: 81%
See 4 more Smart Citations
“…Although large pre-trained Transformers with efficient attention mechanism [20] have been proposed to abstractively summarize long documents, we argue that extractive summarization tends to be more faithful. Furthermore, because MemSum achieves stateof-the-art performance on various long document summarization tasks, MDP approaches will be promising design choices for further research.…”
Section: Related Workmentioning
confidence: 81%
“…• We show that the awareness of the extraction history allows our model to extract more compact summaries and behave more robustly to redundancies in documents than models without history awareness. • Our model outperforms both extractive and abstractive summarization models on PubMed, arXiv [19] and GovReport [20] datasets. • We provide an open source package for replicating our results, as well as usable extractive summarizers trained on each of the three datasets.…”
Section: Introductionmentioning
confidence: 84%
See 3 more Smart Citations