Efficient Attentions for Long Document Summarization

Huang, Luyang; Cao, Shijie; Parulian, Nikolaus Nova; Ji, Heng; Wang, Lu

doi:10.48550/arxiv.2104.02112

Cited by 13 publications

(23 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although large pre-trained Transformers with efficient attention mechanism [20] have been proposed to abstractively summarize long documents, we argue that extractive summarization tends to be more faithful. Furthermore, because MemSum achieves stateof-the-art performance on various long document summarization tasks, MDP approaches will be promising design choices for further research.…”

Section: Related Workmentioning

confidence: 81%

“…• We show that the awareness of the extraction history allows our model to extract more compact summaries and behave more robustly to redundancies in documents than models without history awareness. • Our model outperforms both extractive and abstractive summarization models on PubMed, arXiv [19] and GovReport [20] datasets. • We provide an open source package for replicating our results, as well as usable extractive summarizers trained on each of the three datasets.…”

Section: Introductionmentioning

confidence: 84%

“…BERT-based extractors achieve state-of-the-art performance on datasets such as CNN/DM [26], but the quadratic computational and memory complexities [20] of such models limit their scalability for extractive summarization of long documents with thousands of tokens.…”

Section: Related Workmentioning

confidence: 99%

“…In these two datasets the document to be summarized is the full body of a paper and the gold summary is the corresponding abstract. The GovReport dataset [20] contains U.S. government reports with gold summaries written by experts. We simply treat the document as a list of sentences without considering section information, following a general setting adopted in recent works [14,24] but unlike Cohan et al [19], Xiao and Carenini [21], Huang et al [20],…”

Section: Datasetsmentioning

confidence: 99%

“…To efficiently encode local and global sentence states, we design an extraction agent based on LSTM networks [17]; to encode the extraction history and to select actions, we use a reduced number of attention layers [18] and relatively low dimensionality. These choices enable our model to be easily trainable and to summarize long documents such as scientific papers [19,20] or government reports [20].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

MemSum: Extractive Summarization of Long Documents Using Multi-Step Episodic Markov Decision Processes

Gu¹,

Ash²,

Hahnloser³

2021

Preprint

View full text Add to dashboard Cite

We introduce MemSum (Multi-step Episodic Markov decision process extractive SUMmarizer), a reinforcementlearning-based extractive summarizer enriched at any given time step with information on the current extraction history. Similar to previous models in this vein, MemSum iteratively selects sentences into the summary. Our innovation is in considering a broader information set when summarizing that would intuitively also be used by humans in this task: 1) the text content of the sentence, 2) the global text context of the rest of the document, and 3) the extraction history consisting of the set of sentences that have already been extracted. With a lightweight architecture, MemSum nonetheless obtains state-of-the-art test-set performance (ROUGE score) on long document datasets (PubMed, arXiv, and GovReport). Supporting analysis demonstrates that the added awareness of extraction history gives MemSum robustness against redundancy in the source document.

show abstract

Section: Related Workmentioning

confidence: 81%

Section: Introductionmentioning

confidence: 84%

Section: Related Workmentioning

confidence: 99%

Section: Datasetsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

MemSum: Extractive Summarization of Long Documents Using Multi-Step Episodic Markov Decision Processes

Gu¹,

Ash²,

Hahnloser³

2021

Preprint

View full text Add to dashboard Cite

show abstract

Integrated Digital Library System for Long Documents and their Elements

Chekuri,

Chandrasekar,

Banerjee

et al. 2023

2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL)

View full text Add to dashboard Cite

We describe a next-generation integrated Digital Library (DL) system that addresses the numerous goals associated with long documents such as Electronic Theses and Dissertations (ETDs). Our extensible workflow-centric design supports a variety of users/personas (e.g., researchers, curators, and experimenters) who can benefit from improved access to ETDs and the content buried therein. Our approach leverages natural language processing, deep learning, information retrieval, and software engineering methods. The services cover ingesting, storing, curating, analyzing, detecting, extracting, classifying, summarizing, topic modeling, browsing, searching, retrieving, recommending, visualizing/reporting, and interacting with ETDs and derivative text/image-based elements/objects. Workflows connect the services and their APIs, along with UI-based access. We believe our approach can guide others to combine tailored user support, research, and education by way of extensible DLs. CCS CONCEPTS• Information systems → Digital libraries and archives; Information retrieval; Document representation; Retrieval tasks and goals.

show abstract

Preserve Context Information for Extract-Generate Long-Input Summarization Framework

Yuan

Wang

Cao

et al. 2023

AAAI

View full text Add to dashboard Cite

The Extract-generate framework has been a classic approach for text summarization. As pretrained language models struggling with long-input summarization for their high memory cost, extract-generate framework regains researchers' interests. However, the cost of its effectiveness in dealing with long-input summarization is the loss of context information. In this paper, we present a context-aware extract-generate framework (CAEG) for long-input text summarization. It focuses on preserving both local and global context information in an extract-generate framework with little cost, and can be applied to most of existing extract-generate summarization models. CAEG generates a set of context-related text spans called context prompts for each text snippet and use them to transfer the context information from the extractor and generator. To find such context prompts, we propose to capture the context information based on the interpretation of the extractor, where the text spans having the highest contribution to the extraction decision are considered as containing the richest context information. We evaluate our approach on both long-document and long-dialogue summarization datasets: arXiv and QMSum. The experiment results show that CAEG achieves the-state-of-art result on QMSum and outperforms other extract-generate based models in arXiv.

show abstract

Efficient Attentions for Long Document Summarization

Cited by 13 publications

References 33 publications

MemSum: Extractive Summarization of Long Documents Using Multi-Step Episodic Markov Decision Processes

MemSum: Extractive Summarization of Long Documents Using Multi-Step Episodic Markov Decision Processes

Integrated Digital Library System for Long Documents and their Elements

Preserve Context Information for Extract-Generate Long-Input Summarization Framework

Contact Info

Product

Resources

About