A Simple Theoretical Model of Importance for Summarization

Peyrard, Maxime

doi:10.18653/v1/p19-1101

Cited by 69 publications

(70 citation statements)

References 60 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our study extends the hypothesis to various corpora as well as systems. With a specific focus on importance aspect, a recent work (Peyrard, 2019a) divided it into three subcategories; redundancy, relevance, and informativeness, and provided quantities of each to measure. Compared to this, ours provide broader scale of sub-aspect analysis across various corpora and systems.…”

Section: Related Workmentioning

confidence: 99%

Earlier Isn’t Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization

Jung¹,

Kang²,

Mentch³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Despite the recent developments on neural summarization systems, the underlying logic behind the improvements from the systems and its corpus-dependency remains largely unexplored. Position of sentences in the original text, for example, is a well known bias for news summarization. Following in the spirit of the claim that summarization is a combination of sub-functions, we define three sub-aspects of summarization: position, importance, and diversity and conduct an extensive analysis of the biases of each sub-aspect with respect to the domain of nine different summarization corpora (e.g., news, academic papers, meeting minutes, movie script, books, posts). We find that while position exhibits substantial bias in news articles, this is not the case, for example, with academic papers and meeting minutes. Furthermore, our empirical study shows that different types of summarization systems (e.g., neural-based) are composed of different degrees of the sub-aspects. Our study provides useful lessons regarding consideration of underlying sub-aspects when collecting a new summarization dataset or developing a new system.

show abstract

Section: Related Workmentioning

confidence: 99%

Earlier Isn’t Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization

Jung¹,

Kang²,

Mentch³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

show abstract

“…This criterion casts summarization as finding a set of summary sentences which closely match the doc distribution. When selecting sentences to constitute the summary, this optimization objective penalizes redundancy while maximizing relevance [17]. Because the problem of finding the subset of sentences from a collection that minimizes the KL divergence is NP-hard, a greedy algorithm is often used in practice.…”

Section: B Bayesian Approaches In Summarizationmentioning

confidence: 99%

ToC-RWG: Explore the Combination of Topic Model and Citation Information for Automatic Related Work Generation

Wang

Zhou

et al. 2020

IEEE Access

View full text Add to dashboard Cite

Automatic related work generation is a new challenge in multi-document scientific summarization focusing on refining a related work section for a given scientific paper. In this paper, we propose a brand new framework ToC-RWG for related work generation by incorporating topic model and citation information. We present an unsupervised generative probabilistic model, called QueryTopicSum, which utilizes a LDA-style model to characterize the generative process of both the scientific paper and its reference papers. We also take advantage of citations of reference papers to identify Cited Text Spans (CTS) from reference papers. This approach provides us with a perspective of annotating the importance of the reference papers from the academic community. With QueryTopicSum and the identified CTS as candidate sentences, an optimization framework based on minimizing KL divergence is exerted to select the most representative sentences for related work generation. Our evaluation results on a set of 50 scientific papers along with their corresponding reference papers show that ToC-RWG achieves a considerable improvement over generic multi-document summarization and scientific summarization baselines. INDEX TERMS Automatic related work generation, scientific summarization, cited text spans, topic model.

show abstract

“…Despite its fundamental role, background knowledge has received little attention from the summarization community. Existing approaches largely focus on the relevance aspect, which enforces similarity between the generated summaries and the source documents (Peyrard, 2019). Figure 1: A summary (S) results from the combination of the background knowledge (K) and the source document (D).…”

Section: Introductionmentioning

confidence: 99%

“…Figure 1: A summary (S) results from the combination of the background knowledge (K) and the source document (D). Following Peyrard (2019), S is similar to D (Relevance measured by a small KL(S||D)) but also brings new information compared to background knowledge (informativeness measured by a large KL(S||K)). We can infer the unobserved K from the choices unexplained by the Relevance criteria.…”

Section: Introductionmentioning

confidence: 99%

“…Indeed, choices made by human summarizers and human annotators provide implicit information about their background knowledge. We build upon a recent theoretical model of information selection (Peyrard, 2019), which postulates that information selected in the summary results from 3 desiderata: low redundancy (the summary contain diverse information), high relevance (the summary is representative of the document), and high informativeness (the summary adds new information on top of the background knowledge). The tension between these 3 elements is encoded in a summary scoring function θ K that explicitly depends on the background knowledge K. As illustrated by Fig.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

KLearn: Background Knowledge Inference from Summarization Data

Peyrard

West

2020

Findings of the Association for Computational Linguistics: EMNLP 2020

Self Cite

View full text Add to dashboard Cite

The goal of text summarization is to compress documents to the relevant information while excluding background information already known to the receiver. So far, summarization researchers have given considerably more attention to relevance than to background knowledge. In contrast, this work puts background knowledge in the foreground. Building on the realization that the choices made by human summarizers and annotators contain implicit information about their background knowledge, we develop and compare techniques for inferring background knowledge from summarization data. Based on this framework, we define summary scoring functions that explicitly model background knowledge, and show that these scoring functions fit human judgments significantly better than baselines. We illustrate some of the many potential applications of our framework. First, we provide insights into human information importance priors. Second, we demonstrate that averaging the background knowledge of multiple, potentially biased annotators or corpora greatly improves summary-scoring performance. Finally, we discuss potential applications of our framework beyond summarization.

show abstract

A Simple Theoretical Model of Importance for Summarization

Cited by 69 publications

References 60 publications

Earlier Isn’t Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization

Earlier Isn’t Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization

ToC-RWG: Explore the Combination of Topic Model and Citation Information for Automatic Related Work Generation

KLearn: Background Knowledge Inference from Summarization Data

Contact Info

Product

Resources

About