Fear the REAPER: A System for Automatic Multi-Document Summarization with Reinforcement Learning

Rioux, Cody; Hasan, Sadid A.; Chali, Yllias

doi:10.3115/v1/d14-1075

Cited by 30 publications

(36 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The use of reinforcement learning (RL) in extractive summarization was first explored by Ryang and Abekawa (2012), who proposed to use the TD(λ) algorithm to learn a value function for sentence selection. Rioux et al (2014) improved this framework by replacing the learning agent with another TD(λ) algorithm. However, the performance of their methods was limited by the use of shallow function approximators, which required performing a fresh round of reinforcement learning for every new document to be summarized.…”

Section: Related Workmentioning

confidence: 99%

BanditSum: Extractive Summarization as a Contextual Bandit

Dong¹,

Shen²,

Crawford³

et al. 2018

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

148

152

View full text Add to dashboard Cite

In this work, we propose a novel method for training neural networks to perform singledocument extractive summarization without heuristically-generated extractive labels. We call our approach BANDITSUM as it treats extractive summarization as a contextual bandit (CB) problem, where the model receives a document to summarize (the context), and chooses a sequence of sentences to include in the summary (the action). A policy gradient reinforcement learning algorithm is used to train the model to select sequences of sentences that maximize ROUGE score. We perform a series of experiments demonstrating that BANDITSUM is able to achieve ROUGE scores that are better than or comparable to the state-of-the-art for extractive summarization, and converges using significantly fewer update steps than competing approaches. In addition, we show empirically that BANDIT-SUM performs significantly better than competing approaches when good summary sentences appear late in the source document 1 .

show abstract

Section: Related Workmentioning

confidence: 99%

BanditSum: Extractive Summarization as a Contextual Bandit

Dong¹,

Shen²,

Crawford³

et al. 2018

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

148

152

View full text Add to dashboard Cite

show abstract

“…(8)) α = 10 −3 learning rate for preference learning φ(y, x) vectorised representation of summary y for document cluster x (see Eq. (8)); we use the same vector representation as Rioux et al (2014)…”

Section: Parameter Descriptionmentioning

confidence: 99%

“…[0,10]). For the vector representation φ, we use the same 200-dimensional bag-of-bigram representation as Rioux et al (2014) (w d = 1) .288 * .297 * .319 * den (we = 1)…”

Section: Parameter Descriptionmentioning

confidence: 99%

“…The reward function R returns an evaluation score of the summary once the action terminate is performed; otherwise it returns 0 because the summary is still under construction and thus not ready to be evaluated (so-called delayed rewards). Providing non-zero rewards before the action terminate can lead to even worse result, as reported by Rioux et al (2014). The terminal states set T includes all states corresponding to summaries exceeding the given length requirement and an absorbing state s T .…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Preference-based interactive multi-document summarisation

2019

View full text Add to dashboard Cite

Interactive NLP is a promising paradigm to close the gap between automatic NLP systems and the human upper bound. Preference-based interactive learning has been successfully applied, but the existing methods require several thousand interaction rounds even in simulations with perfect user feedback. In this paper, we study preference-based interactive summarisation. To reduce the number of interaction rounds, we propose the Active Preferencebased ReInforcement Learning (APRIL) framework. APRIL uses Active Learning to query the user, Preference Learning to learn a summary ranking function from the preferences, and neural Reinforcement Learning to efficiently search for the (near-)optimal summary. Our results show that users can easily provide reliable preferences over summaries and that APRIL outperforms the state-of-the-art preference-based interactive method in both simulation and real-user experiments. 1 We first introduced APRIL in (Gao et al. 2018). Towards the end of §1 we discuss how this article substantially extends our previous work.

show abstract

“…Many works have viewed the summarization problem as a supervised classification problem in which several features are used to predict the inclusion of document sentences in the summary. Variations of supervised models have been utilized for summary generation, such as: maximum entropy (Osborne, 2002), HMM (Conroy et al, 2011), CRF (Galley, 2006;Shen et al, 2007;Chali and Hasan, 2012), SVM (Xie and Liu, 2010), logistic regression (Louis et al, 2010) and reinforcement learning (Rioux et al, 2014). Problems with supervised models in context of summarization include the need for large amount of annotated data and domain dependency.…”

Section: Reference Articlementioning

confidence: 99%

Scientific Article Summarization Using Citation-Context and Article's Discourse Structure

Cohan¹,

Goharian²

2015

Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

We propose a summarization approach for scientific articles which takes advantage of citation-context and the document discourse model. While citations have been previously used in generating scientific summaries, they lack the related context from the referenced article and therefore do not accurately reflect the article's content. Our method overcomes the problem of inconsistency between the citation summary and the article's content by providing context for each citation. We also leverage the inherent scientific article's discourse for producing better summaries. We show that our proposed method effectively improves over existing summarization approaches (greater than 30% improvement over the best performing baseline) in terms of ROUGE scores on TAC2014 scientific summarization dataset. While the dataset we use for evaluation is in the biomedical domain, most of our approaches are general and therefore adaptable to other domains.

show abstract

Fear the REAPER: A System for Automatic Multi-Document Summarization with Reinforcement Learning

Cited by 30 publications

References 22 publications

BanditSum: Extractive Summarization as a Contextual Bandit

BanditSum: Extractive Summarization as a Contextual Bandit

Preference-based interactive multi-document summarisation

Scientific Article Summarization Using Citation-Context and Article's Discourse Structure

Contact Info

Product

Resources

About