Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization

Narayan, Shashi; Cohen, Shay B.; Lapata, Mirella

doi:10.18653/v1/d18-1206

Cited by 825 publications

(868 citation statements)

References 28 publications

Supporting

Mentioning

738

Contrasting

Unclassified

Order By: Relevance

“…XSum contains 226,711 news articles accompanied with a one-sentence summary, answering the question "What is this article about?". We used the splits of Narayan et al (2018a) for training, validation, and testing (204,045/11,332/11,334) and followed the pre-processing introduced in their work. Input documents were truncated to 512 tokens.…”

Section: Summarization Datasetsmentioning

confidence: 99%

Text Summarization with Pretrained Encoders

Liu¹,

Lapata²

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

Self Cite

1,163

1,078

View full text Add to dashboard Cite

Bidirectional Encoder Representations from Transformers (BERT; Devlin et al. 2019) represents the latest incarnation of pretrained language models which have recently advanced a wide range of natural language processing tasks. In this paper, we showcase how BERT can be usefully applied in text summarization and propose a general framework for both extractive and abstractive models. We introduce a novel document-level encoder based on BERT which is able to express the semantics of a document and obtain representations for its sentences. Our extractive model is built on top of this encoder by stacking several intersentence Transformer layers. For abstractive summarization, we propose a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two (the former is pretrained while the latter is not). We also demonstrate that a two-staged fine-tuning approach can further boost the quality of the generated summaries. Experiments on three datasets show that our model achieves stateof-the-art results across the board in both extractive and abstractive settings. 1

show abstract

Section: Summarization Datasetsmentioning

confidence: 99%

Text Summarization with Pretrained Encoders

Liu¹,

Lapata²

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

Self Cite

1,163

1,078

View full text Add to dashboard Cite

show abstract

“…Lin and Hovy (1997) studied the position hypothesis, especially in the news article writing (Hong and Nenkova, 2014;Narayan et al, 2018a) but not in other domains such as conversations (Kedzie et al, 2018). Narayan et al (2018a) collected a new corpus to address the bias by compressing multiple contents of source document in the single target summary. In the bias analysis of systems, Lin andBilmes (2012, 2011) studied the sub-aspect hypothesis of summarization systems.…”

Section: Related Workmentioning

confidence: 99%

“…• Summarization of personal post and news articles except for XSum (Narayan et al, 2018a) are biased to the position aspect, while academic papers are well balanced among the three aspects (see Figure 1 (a)). Summarizing long documents (e.g.…”

Section: Introductionmentioning

confidence: 99%

Earlier Isn’t Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization

Jung¹,

Kang²,

Mentch³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Despite the recent developments on neural summarization systems, the underlying logic behind the improvements from the systems and its corpus-dependency remains largely unexplored. Position of sentences in the original text, for example, is a well known bias for news summarization. Following in the spirit of the claim that summarization is a combination of sub-functions, we define three sub-aspects of summarization: position, importance, and diversity and conduct an extensive analysis of the biases of each sub-aspect with respect to the domain of nine different summarization corpora (e.g., news, academic papers, meeting minutes, movie script, books, posts). We find that while position exhibits substantial bias in news articles, this is not the case, for example, with academic papers and meeting minutes. Furthermore, our empirical study shows that different types of summarization systems (e.g., neural-based) are composed of different degrees of the sub-aspects. Our study provides useful lessons regarding consideration of underlying sub-aspects when collecting a new summarization dataset or developing a new system.

show abstract

“…Several datasets have been used to aid the development of text summarization models. These datasets are predominantly from the news domain and have several drawbacks such as limited training data (Document Understanding Conference 2 ), shorter summaries (Gigaword (Napoles et al, 2012), XSum (Narayan et al, 2018), and Newsroom (Grusky et al, 2018)), and near-extractive summaries (CNN / Daily Mail dataset (Hermann et al, 2015) news reporting, summary-worthy content is nonuniformly distributed within each article. ArXiv and PubMed datasets (Cohan et al, 2018), which are collected from scientific repositories, are limited in size and have longer yet extractive summaries.…”

Section: Related Workmentioning

confidence: 99%

BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization

Sharma

Wang

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

123

View full text Add to dashboard Cite

Most existing text summarization datasets are compiled from the news domain, where summaries have a flattened discourse structure. In such datasets, summary-worthy content often appears in the beginning of input articles. Moreover, large segments from input articles are present verbatim in their respective summaries. These issues impede the learning and evaluation of systems that can understand an article's global content structure as well as produce abstractive summaries with high compression ratio. In this work, we present a novel dataset, BIGPATENT, consisting of 1.3 million records of U.S. patent documents along with human written abstractive summaries. Compared to existing summarization datasets, BIGPATENT has the following properties: i) summaries contain a richer discourse structure with more recurring entities, ii) salient content is evenly distributed in the input, and iii) lesser and shorter extractive fragments are present in the summaries. Finally, we train and evaluate baselines and popular learning models on BIGPATENT to shed light on new challenges and motivate future directions for summarization research. 1 BIGPATENT dataset is available to download online at evasharma.github.io/bigpatent.

show abstract

Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization

Cited by 825 publications

References 28 publications

Text Summarization with Pretrained Encoders

Text Summarization with Pretrained Encoders

Earlier Isn’t Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization

BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization

Contact Info

Product

Resources

About