Ranking Sentences for Extractive Summarization with Reinforcement Learning

Narayan, Shashi; Cohen, Shay B.; Lapata, Mirella

doi:10.18653/v1/n18-1158

Cited by 469 publications

(470 citation statements)

References 46 publications

Supporting

Mentioning

442

Contrasting

Unclassified

Order By: Relevance

“…The fourth block reports results with fine-tuned BERT models: BERTSUMEXT and its two variants (one without interval embeddings, and one with the large version of BERT), BERTSUM-ABS, and BERTSUMEXTABS. BERT-based models outperform the LEAD-3 baseline which is not a strawman; on the CNN/DailyMail corpus it is indeed superior to several extractive (Nallapati et al, 2017;Narayan et al, 2018b; and abstractive models (See et al, 2017). BERT models collectively outperform all previously proposed extractive and abstractive systems, only falling behind the ORACLE upper bound.…”

Section: Automatic Evaluationmentioning

confidence: 82%

Text Summarization with Pretrained Encoders

Liu¹,

Lapata²

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

Self Cite

1,163

1,078

View full text Add to dashboard Cite

Bidirectional Encoder Representations from Transformers (BERT; Devlin et al. 2019) represents the latest incarnation of pretrained language models which have recently advanced a wide range of natural language processing tasks. In this paper, we showcase how BERT can be usefully applied in text summarization and propose a general framework for both extractive and abstractive models. We introduce a novel document-level encoder based on BERT which is able to express the semantics of a document and obtain representations for its sentences. Our extractive model is built on top of this encoder by stacking several intersentence Transformer layers. For abstractive summarization, we propose a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two (the former is pretrained while the latter is not). We also demonstrate that a two-staged fine-tuning approach can further boost the quality of the generated summaries. Experiments on three datasets show that our model achieves stateof-the-art results across the board in both extractive and abstractive settings. 1

show abstract

Section: Automatic Evaluationmentioning

confidence: 82%

Text Summarization with Pretrained Encoders

Liu¹,

Lapata²

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

Self Cite

1,163

1,078

View full text Add to dashboard Cite

show abstract

“…On top of the seq2seq framework, many other variants have been studied using convolutional networks (Cheng and Lapata, 2016;Allamanis et al, 2016), pointer networks (See et al, 2017), scheduled sampling (Bengio et al, 2015), and reinforcement learning (Paulus et al, 2017). In extractive systems, different types of encoders (Cheng and Lapata, 2016;Nallapati et al, 2017;Kedzie et al, 2018) and optimization techniques (Narayan et al, 2018b) have been developed. Our goal is to explore which types of systems learns which sub-aspect of summarization.…”

Section: Related Workmentioning

confidence: 99%

Earlier Isn’t Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization

Jung¹,

Kang²,

Mentch³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Despite the recent developments on neural summarization systems, the underlying logic behind the improvements from the systems and its corpus-dependency remains largely unexplored. Position of sentences in the original text, for example, is a well known bias for news summarization. Following in the spirit of the claim that summarization is a combination of sub-functions, we define three sub-aspects of summarization: position, importance, and diversity and conduct an extensive analysis of the biases of each sub-aspect with respect to the domain of nine different summarization corpora (e.g., news, academic papers, meeting minutes, movie script, books, posts). We find that while position exhibits substantial bias in news articles, this is not the case, for example, with academic papers and meeting minutes. Furthermore, our empirical study shows that different types of summarization systems (e.g., neural-based) are composed of different degrees of the sub-aspects. Our study provides useful lessons regarding consideration of underlying sub-aspects when collecting a new summarization dataset or developing a new system.

show abstract

“…Extractive: [14] use hierarchical Recurrent Neural Networks (RNNs) to get the representations of the sentences and classify the importance of sentences. [15] rank extracted sentences for summary generation through a reinforcement learning and [16] extract salient sentences and propose a new policy gradient method to rewrite these sentences (i.e., compresses and paraphrases) to generate a concise overall summary. [17] propose a framework composed of a hierarchical document encoder based on CNNs and an attention-based extractor with attention over external information.…”

Section: Related Workmentioning

confidence: 99%

Selective Attention Encoders by Syntactic Graph Convolutional Networks for Document Summarization

Xu¹,

Wang

Han³

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Abstractive text summarization is a challenging task, and one need to design a mechanism to effectively extract salient information from the source text and then generate a summary. A parsing process of the source text contains critical syntactic or semantic structures, which is useful to generate more accurate summary. However, modeling a parsing tree for text summarization is not trivial due to its non-linear structure and it is harder to deal with a document that includes multiple sentences and their parsing trees. In this paper, we propose to use a graph to connect the parsing trees from the sentences in a document and utilize the stacked graph convolutional networks (GCNs) to learn the syntactic representation for a document. The selective attention mechanism is used to extract salient information in semantic and structural aspect and generate an abstractive summary. We evaluate our approach on the CNN/Daily Mail text summarization dataset. The experimental results show that the proposed GCNs based selective attention approach outperforms the baselines and achieves the state-of-the-art performance on the dataset.

show abstract

Ranking Sentences for Extractive Summarization with Reinforcement Learning

Cited by 469 publications

References 46 publications

Text Summarization with Pretrained Encoders

Text Summarization with Pretrained Encoders

Earlier Isn’t Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization

Selective Attention Encoders by Syntactic Graph Convolutional Networks for Document Summarization

Contact Info

Product

Resources

About