Document Summarization Based on Data Reconstruction

He, Zhanying; Chen, Chun; Bu, Jiajun; Wang, Can; Zhang, Lijun; Cai, Deng; He, Xiaofei

doi:10.1609/aaai.v26i1.8202

Cited by 49 publications

(6 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Summarization systems can be generally categorized into two paradigms: extractive and abstractive. Extractive systems extract certain sentences and clauses from input, for example, based on salient features (Zhou and Rush, 2019) or feature construction (He et al, 2012). Abstraction systems generate new utterances as the summary, e.g., by sequence-to-sequence models trained in a supervised way (Zhang et al, 2020;Liu et al, 2021b).…”

Section: Related Workmentioning

confidence: 99%

Learning Non-Autoregressive Models from Search for Unsupervised Sentence Summarization

Liu¹,

Huang²,

Mou³

2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

Text summarization aims to generate a short summary for an input text. In this work, we propose a Non-Autoregressive Unsupervised Summarization (NAUS) approach, which does not require parallel data for training. Our NAUS first performs edit-based search towards a heuristically defined score, and generates a summary as pseudo-groundtruth. Then, we train an encoder-only non-autoregressive Transformer based on the search result. We also propose a dynamic programming approach for length-control decoding, which is important for the summarization task. Experiments on two datasets show that NAUS achieves state-of-the-art performance for unsupervised summarization, yet largely improving inference efficiency. Further, our algorithm is able to perform explicit length-transfer summary generation. 1

show abstract

Section: Related Workmentioning

confidence: 99%

Learning Non-Autoregressive Models from Search for Unsupervised Sentence Summarization

Liu¹,

Huang²,

Mou³

2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

show abstract

“…A couple of works proposed extractive methods for unsupervised summarization, which generally assign salient scores to sentences in a document and select the top-ranked ones to form the summary. Typical methods are based on word frequency (Nenkova and Vanderwende 2005), topic modeling (Harabagiu and Lacatusu 2005), cluster centroid (Radev et al 2004;Rossiello et al 2017), sentence graph (Erkan and Radev 2004;Zheng and Lapata 2019), Integer Linear Programming (ILP) optimization (McDonald 2007;Gillick et al 2009), and sparse coding (He et al 2012;Liu et al 2015). Recently, abstractive approaches have been proposed due to the success of deep neural models, where the autoencoder framework has been applied (Miao and Blunsom 2016;Fevry and Phang 2018;Chu and Liu 2019;Liu et al 2019b).…”

Section: Related Work Unsupervised Text Summarizationmentioning

confidence: 99%

Unsupervised Summarization for Chat Logs with Topic-Oriented Ranking and Context-Aware Auto-Encoders

Zou

Jun

Zhao

et al. 2021

AAAI

View full text Add to dashboard Cite

Automatic chat summarization can help people quickly grasp important information from numerous chat messages. Unlike conventional documents, chat logs usually have fragmented and evolving topics. In addition, these logs contain a quantity of elliptical and interrogative sentences, which make the chat summarization highly context dependent. In this work, we propose a novel unsupervised framework called RankAE to perform chat summarization without employing manually labeled data. RankAE consists of a topic-oriented ranking strategy that selects topic utterances according to centrality and diversity simultaneously, as well as a denoising auto-encoder that is carefully designed to generate succinct but context-informative summaries based on the selected utterances. To evaluate the proposed method, we collect a large-scale dataset of chat logs from a customer service environment and build an annotated set only for model evaluation. Experimental results show that RankAE significantly outperforms other unsupervised methods and is able to generate high-quality summaries in terms of relevance and topic coverage.

show abstract

“…The methods based on data reconstruction, for example DSDR (He et al 2012) reconstructs each sentence by a non-negative linear combination of summary sentences and then uses sparse coding to select summary sentences that minimize the document reconstruction error. SpOpt (Yao, Wan, and Xiao 2015) adds a sentence dissimilarity term to the objective to maximize diversity.…”

Section: Related Workmentioning

confidence: 99%

“…In machine learning, and fields such as natural language processing (NLP) and information retrieval (IR), various approaches have been used to solve this problem. Query-based MDS can be in either supervised where labels are available and a training phase occurs, for example (Lin andBilmes 2011, 2012) or unsupervised where there are no target labels to train on as in (He et al 2012;Yao, Wan, and Xiao 2015;Feigenblat et al 2017). In query-based extractive video summarization, recent methods include snippet selection using sequential and hierarchical Determinantal Point Processes (DPP) (Sharghi, Gong, and Shah 2016;Sharghi, Laurel, and Gong 2017).…”

Section: Introductionmentioning

confidence: 99%

Submodular Span, with Applications to Conditional Data Summarization

Kumari

Bilmes

2021

AAAI

View full text Add to dashboard Cite

As an extension to the matroid span problem, we propose the submodular span problem that involves finding a large set of elements with small gain relative to a given query set. We then propose a two-stage Submodular Span Summarization (S3) framework to achieve a form of conditional or query-focused data summarization. The first stage encourages the summary to be relevant to a given query set, and the second stage encourages the final summary to be diverse, thus achieving two important necessities for a good query-focused summary. Unlike previous methods, our framework uses only a single submodular function defined over both data and query. We analyze theoretical properties in the context of both matroids and polymatroids that elucidate when our methods should work well. We find that a scalable approximation algorithm to the polymatroid submodular span problem has good theoretical and empirical properties. We provide empirical and qualitative results on three real-world tasks: conditional multi-document summarization on the DUC 2005-2007 datasets, conditional video summarization on the UT-Egocentric dataset, and conditional image corpus summarization on the ImageNet dataset. We use deep neural networks, specifically a BERT model for text, AlexNet for video frames, and Bi-directional Generative Adversarial Networks (BiGAN) for ImageNet images to help instantiate the submodular functions. The result is a minimally supervised form of conditional summarization that matches or improves over the previous state-of-the-art.

show abstract

Document Summarization Based on Data Reconstruction

Cited by 49 publications

References 32 publications

Learning Non-Autoregressive Models from Search for Unsupervised Sentence Summarization

Learning Non-Autoregressive Models from Search for Unsupervised Sentence Summarization

Unsupervised Summarization for Chat Logs with Topic-Oriented Ranking and Context-Aware Auto-Encoders

Submodular Span, with Applications to Conditional Data Summarization

Contact Info

Product

Resources

About