APRIL: Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement Learning

Gao, Yang; Meyer, Christian M.; Gurevych, Iryna

doi:10.18653/v1/d18-1445

Cited by 29 publications

(47 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This work extends our earlier work (Gao et al 2018) in three aspects. (i) We present a new user study on the reliability and usability of the preferencebased interaction ( §5).…”

Section: Introductionsupporting

confidence: 84%

“…Note that in all previous work we are aware of (P.V.S. and Meyer 2017; Kreutzer et al 2017;Gao et al 2018), the evaluation was based on simulations with a perfect user oracle. Therefore, we expect that our results with real user interaction better reflect the true results.…”

Section: April Vs Sppimentioning

confidence: 99%

“…Based on this study, we propose a realistic simulated user, which is used in our experiments. (ii) We evaluate multiple new APL strategies and a novel neural RL algorithm, and compare them with the counterpart methods used in Gao et al (2018). The use of these new algorithms further boost the efficiency and performance of APRIL ( §6).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Preference-based interactive multi-document summarisation

2019

Self Cite

View full text Add to dashboard Cite

Interactive NLP is a promising paradigm to close the gap between automatic NLP systems and the human upper bound. Preference-based interactive learning has been successfully applied, but the existing methods require several thousand interaction rounds even in simulations with perfect user feedback. In this paper, we study preference-based interactive summarisation. To reduce the number of interaction rounds, we propose the Active Preferencebased ReInforcement Learning (APRIL) framework. APRIL uses Active Learning to query the user, Preference Learning to learn a summary ranking function from the preferences, and neural Reinforcement Learning to efficiently search for the (near-)optimal summary. Our results show that users can easily provide reliable preferences over summaries and that APRIL outperforms the state-of-the-art preference-based interactive method in both simulation and real-user experiments. 1 We first introduced APRIL in (Gao et al. 2018). Towards the end of §1 we discuss how this article substantially extends our previous work.

show abstract

“…This work extends our earlier work (Gao et al 2018) in three aspects. (i) We present a new user study on the reliability and usability of the preferencebased interaction ( §5).…”

Section: Introductionsupporting

confidence: 84%

Section: April Vs Sppimentioning

confidence: 99%

See 1 more Smart Citation

Preference-based interactive multi-document summarisation

2019

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, besides the fact that results were obtained from other domains than the presented telecommunication domain in this work, in both works, the authors did not apply any pre-qualification test or did not provide information about crowdsourcing task details, which can also cause a rather large influencing effect. Following, Gao et al (2018); Falke et al (2017); Fan et al (2018) have used crowdsourcing as the source of human evaluation to rate their automatic summarization systems. Nevertheless, they did not question the robustness of crowdsourcing for this task and compared the crowd with expert data.…”

Section: Crowdsourcing For Summarization Evaluationmentioning

confidence: 99%

Best Practices for Crowd-based Evaluation of German Summarization: Comparing Crowd, Expert and Automatic Evaluation

Iskender¹,

Polzehl²,

Möller³

2020

Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems

View full text Add to dashboard Cite

One of the main challenges in the development of summarization tools is summarization quality evaluation. On the one hand, the human assessment of summarization quality conducted by linguistic experts is slow, expensive, and still not a standardized procedure. On the other hand, the automatic assessment metrics are reported not to correlate high enough with human quality ratings. As a solution, we propose crowdsourcing as a fast, scalable, and costeffective alternative to expert evaluations to assess the intrinsic and extrinsic quality of summarization by comparing crowd ratings with expert ratings and automatic metrics such as ROUGE, BLEU, or BertScore on a German summarization data set. Our results provide a basis for best practices for crowd-based summarization evaluation regarding major influential factors such as the best annotation aggregation method, the influence of readability and reading effort on summarization evaluation, and the optimal number of crowd workers to achieve comparable results to experts, especially when determining factors such as overall quality, grammaticality, referential clarity, focus, structure & coherence, summary usefulness, and summary informativeness.

show abstract

“…In particular, the human scores for some documents might be on average higher than for other documents, which easily confuses the regression. Preference learning (PL) is robust to these issues, by learning the relative ordering induced by the human scores (Gao et al, 2018). PL can be formulated as a binary classification task (Maystre, 2018), where the input is a pair of data points {(S i , D i , h i ), (S j , D j , h j )} and the output is a binary flag indicating whether S i is better than S j , i.e., h i > h j :…”

Section: Inferring K With Human Judgmentsmentioning

confidence: 99%

KLearn: Background Knowledge Inference from Summarization Data

Peyrard

West

2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

The goal of text summarization is to compress documents to the relevant information while excluding background information already known to the receiver. So far, summarization researchers have given considerably more attention to relevance than to background knowledge. In contrast, this work puts background knowledge in the foreground. Building on the realization that the choices made by human summarizers and annotators contain implicit information about their background knowledge, we develop and compare techniques for inferring background knowledge from summarization data. Based on this framework, we define summary scoring functions that explicitly model background knowledge, and show that these scoring functions fit human judgments significantly better than baselines. We illustrate some of the many potential applications of our framework. First, we provide insights into human information importance priors. Second, we demonstrate that averaging the background knowledge of multiple, potentially biased annotators or corpora greatly improves summary-scoring performance. Finally, we discuss potential applications of our framework beyond summarization.

show abstract

APRIL: Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement Learning

Cited by 29 publications

References 18 publications

Preference-based interactive multi-document summarisation

Preference-based interactive multi-document summarisation

Best Practices for Crowd-based Evaluation of German Summarization: Comparing Crowd, Expert and Automatic Evaluation

KLearn: Background Knowledge Inference from Summarization Data

Contact Info

Product

Resources

About