Estimating Summary Quality with Pairwise Preferences

Zopf, Markus

doi:10.18653/v1/n18-1152

Cited by 21 publications

(16 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As for the intrinsic evaluation function U * , recent work has suggested that human preferences over summaries have high correlations to ROUGE scores (Zopf, 2018). Therefore, we define: (2017).…”

Section: April: Decomposing Sppi Into Active Preference Learning and Rlmentioning

confidence: 99%

APRIL: Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement Learning

Gao

Meyer

Gurevych

2018

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

We propose a method to perform automatic document summarisation without using reference summaries. Instead, our method interactively learns from users' preferences. The merit of preference-based interactive summarisation is that preferences are easier for users to provide than reference summaries. Existing preference-based interactive learning methods suffer from high sample complexity, i.e. they need to interact with the oracle for many rounds in order to converge. In this work, we propose a new objective function, which enables us to leverage active learning, preference learning and reinforcement learning techniques in order to reduce the sample complexity. Both simulation and real-user experiments suggest that our method significantly advances the state of the art. Our source code is freely available at https

show abstract

Section: April: Decomposing Sppi Into Active Preference Learning and Rlmentioning

confidence: 99%

APRIL: Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement Learning

Gao

Meyer

Gurevych

2018

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…The use of preference-based feedback in NLP attracts increasing research interest. Zopf (2018) Fig. 1: SPPI (a) directly uses the collected preferences to "teach" its summarygenerator, while APRIL (b) learns a reward function as the proxy of the user/oracle, and uses the learnt reward to "teach" the RL-based summariser.…”

Section: Introductionmentioning

confidence: 99%

Preference-based interactive multi-document summarisation

2019

View full text Add to dashboard Cite

Interactive NLP is a promising paradigm to close the gap between automatic NLP systems and the human upper bound. Preference-based interactive learning has been successfully applied, but the existing methods require several thousand interaction rounds even in simulations with perfect user feedback. In this paper, we study preference-based interactive summarisation. To reduce the number of interaction rounds, we propose the Active Preferencebased ReInforcement Learning (APRIL) framework. APRIL uses Active Learning to query the user, Preference Learning to learn a summary ranking function from the preferences, and neural Reinforcement Learning to efficiently search for the (near-)optimal summary. Our results show that users can easily provide reliable preferences over summaries and that APRIL outperforms the state-of-the-art preference-based interactive method in both simulation and real-user experiments. 1 We first introduced APRIL in (Gao et al. 2018). Towards the end of §1 we discuss how this article substantially extends our previous work.

show abstract

“…First, research has shown that ROUGE is inconsistent with human evaluation for summary quality (Liu and Liu, 2008;Zopf, 2018;Kryscinski et al, 2019;Maynez et al, 2020). We evaluate ROUGE using PolyTope from the perspective of both instance-level and system-level performances.…”

Section: Analysis Of Evaluation Methodsmentioning

confidence: 99%

“…In fact, while yielding rich conclusions, the above analytical work has also exposed deficiencies of automatic toolkits. The quality of automatic evaluation is often criticized by the research community (Novikova et al, 2017;Zopf, 2018) for its insufficiency in neither permeating into the overall quality of generation-based texts (Liu and Liu, 2008) nor correlating with human judgements (Kryscinski et al, 2019).…”

Section: Related Workmentioning

confidence: 99%

What Have We Achieved on Text Summarization?

Huang¹,

Cui²,

Yang³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Deep learning has led to significant improvement in text summarization with various methods investigated and improved ROUGE scores reported over the years. However, gaps still exist between summaries produced by automatic summarizers and human professionals. Aiming to gain more understanding of summarization systems with respect to their strengths and limits on a fine-grained syntactic and semantic level, we consult the Multidimensional Quality Metric 1 (MQM) and quantify 8 major sources of errors on 10 representative summarization models manually. Primarily, we find that 1) under similar settings, extractive summarizers are in general better than their abstractive counterparts thanks to strength in faithfulness and factual-consistency; 2) milestone techniques such as copy, coverage and hybrid extractive/abstractive methods do bring specific improvements but also demonstrate limitations; 3) pre-training techniques, and in particular sequence-to-sequence pre-training, are highly effective for improving text summarization, with BART giving the best results. * Equal contribution. † Corresponding author. 1 MQM is a framework for declaring and describing human writing quality which stipulates a hierarchical listing of error types restricted to human writing and translation.

show abstract

Estimating Summary Quality with Pairwise Preferences

Cited by 21 publications

References 29 publications

APRIL: Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement Learning

APRIL: Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement Learning

Preference-based interactive multi-document summarisation

What Have We Achieved on Text Summarization?

Contact Info

Product

Resources

About