Efficient Attentions for Long Document Summarization

Huang, Luyang; Cao, Shijie; Parulian, Nikolaus Nova; Ji, Heng; Wang, Lu

doi:10.18653/v1/2021.naacl-main.112

Cited by 84 publications

(120 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The companion heads serve a similar purpose but have a narrower focus on endorsed source tokens-frequently endorsed tokens are more likely to be copied over by companion heads. The method thus improves head diversity similar to that of sparse Transformers (Correia et al, 2019;Huang et al, 2021). The hyperparameter τ controls the level of endorsement.…”

Section: Companion Headsmentioning

confidence: 94%

Modeling Endorsement for Multi-Document Abstractive Summarization

Lebanoff

Wang

Feng

et al. 2021

Proceedings of the Third Workshop on New Frontiers in Summarization

View full text Add to dashboard Cite

A crucial difference between single-and multidocument summarization is how salient content manifests itself in the document(s). While such content may appear at the beginning of a single document, essential information is frequently reiterated in a set of documents related to a particular topic, resulting in an endorsement effect that increases information salience. In this paper, we model the cross-document endorsement effect and its utilization in multiple document summarization. Our method generates a synopsis from each document, which serves as an endorser to identify salient content from other documents. Strongly endorsed text segments are used to enrich a neural encoderdecoder model to consolidate them into an abstractive summary. The method has a great potential to learn from fewer examples to identify salient content, which alleviates the need for costly retraining when the set of documents is dynamically adjusted. Through extensive experiments on benchmark multi-document summarization datasets, we demonstrate the effectiveness of our proposed method over strong published baselines. Finally, we shed light on future research directions and discuss broader challenges of this task using a case study.

show abstract

Section: Companion Headsmentioning

confidence: 94%

Modeling Endorsement for Multi-Document Abstractive Summarization

Lebanoff

Wang

Feng

et al. 2021

Proceedings of the Third Workshop on New Frontiers in Summarization

View full text Add to dashboard Cite

show abstract

“…GovReport (Huang et al, 2021): A summarization dataset of reports addressing various national policy issues published by the Congressional Research Service 2 and the U.S. Government Accountability Office, 3 where each document is paired with an expert-written executive summary. The reports and their summaries are longer than their equivalents in other popular long-document summarization datasets; for example, GovReport's documents are approximately 1.5 and 2.5 times longer than the documents in Arxiv and PubMed (Cohan et al, 2018), respectively.…”

Section: Datasetsmentioning

confidence: 99%

SCROLLS: Standardized CompaRison Over Long Language Sequences

Shaham¹,

Segal²,

Ivgi³

et al. 2022

Preprint

View full text Add to dashboard Cite

NLP benchmarks have largely focused on short texts, such as sentences and paragraphs, even though long texts comprise a considerable amount of natural language in the wild. We introduce SCROLLS, a suite of tasks that require reasoning over long texts. We examine existing long-text datasets, and handpick ones where the text is naturally long, while prioritizing tasks that involve synthesizing information across the input. SCROLLS contains summarization, question answering, and natural language inference tasks, covering multiple domains, including literature, science, business, and entertainment. Initial baselines, including Longformer Encoder-Decoder, indicate that there is ample room for improvement on SCROLLS. We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods. 1

show abstract

“…Voita et al (2019) observed that heads are redundant, and Clark et al (2019) found that a head in BERT rarely attends to several consecutive tokens. Based on these, Huang et al (2021) applies a stride pattern in the encoder-decoder attention, reducing its cost by a factor of the stride size, and this method is likely complementary to our work.…”

Section: Related Workmentioning

confidence: 99%

Sparsity and Sentence Structure in Encoder-Decoder Attention of Summarization Systems

Manakul

Gales

2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Transformer models have achieved state-ofthe-art results in a wide range of NLP tasks including summarization. Training and inference using large transformer models can be computationally expensive. Previous work has focused on one important bottleneck, the quadratic self-attention mechanism in the encoder. Modified encoder architectures such as LED or LoBART use local attention patterns to address this problem for summarization. In contrast, this work focuses on the transformer's encoder-decoder attention mechanism. The cost of this attention becomes more significant in inference or training approaches that require model-generated histories. First, we examine the complexity of the encoder-decoder attention. We demonstrate empirically that there is a sparse sentence structure in document summarization that can be exploited by constraining the attention mechanism to a subset of input sentences, whilst maintaining system performance. Second, we propose a modified architecture that selects the subset of sentences to constrain the encoder-decoder attention. Experiments are carried out on abstractive summarization tasks, including CNN/DailyMail, XSum, Spotify Podcast, and arXiv. 1

show abstract

Efficient Attentions for Long Document Summarization

Cited by 84 publications

References 34 publications

Modeling Endorsement for Multi-Document Abstractive Summarization

Modeling Endorsement for Multi-Document Abstractive Summarization

SCROLLS: Standardized CompaRison Over Long Language Sequences

Sparsity and Sentence Structure in Encoder-Decoder Attention of Summarization Systems

Contact Info

Product

Resources

About