SUM-QE: a BERT-based Summary Quality Estimation Model

Xenouleas, Stratos; Malakasiotis, Prodromos; Apidianaki, Marianna; Androutsopoulos, Ion

doi:10.18653/v1/d19-1618

Cited by 36 publications

(38 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• SIMetrix (Louis and Nenkova, 2009) • SumQE (Xenouleas et al, 2019) Among these metrics, 6 have original implementations in Java, 6 in Python, 1 in Perl, and 1 with no known official implementation (Pyramid Score).…”

Section: The Metric Interfacementioning

confidence: 99%

SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics

Deutsch¹,

Roth²

2020

Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)

View full text Add to dashboard Cite

We present SacreROUGE, an open-source library for using and developing summarization evaluation metrics. 1 SacreROUGE removes many obstacles that researchers face when using or developing metrics: (1) The library provides Python wrappers around the official implementations of existing evaluation metrics so they share a common, easy-to-use interface;(2) it provides functionality to evaluate how well any metric implemented in the library correlates to human-annotated judgments, so no additional code needs to be written for a new evaluation metric; and (3) it includes scripts for loading datasets that contain human judgments so they can easily be used for evaluation. This work describes the design of the library, including the core Metric interface, the command-line API for evaluating summarization models and metrics, and the scripts to load and reformat publicly available datasets. The development of SacreROUGE is ongoing and open to contributions from the community.

show abstract

Section: The Metric Interfacementioning

confidence: 99%

SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics

Deutsch¹,

Roth²

2020

Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)

View full text Add to dashboard Cite

show abstract

“…Some work discussed how to evaluate the quality of generated text in the reference-free setting (Louis and Nenkova, 2013;Peyrard et al, 2017;Peyrard and Gurevych, 2018;Shimanaka et al, 2018;Xenouleas et al, 2019;Sun and Nenkova, 2019;Böhm et al, 2019;Chen et al, 2018;Gao et al, 2020). Louis and Nenkova (2013), Peyrard et al (2017) and Peyrard and Gurevych (2018) leveraged regression models to fit human judgement.…”

Section: Reference-free Metricsmentioning

confidence: 99%

“…RUSE (Shimanaka et al, 2018) use sentence embeddings generated by three different models and aggregate them using a MLP regressor. Xenouleas et al (2019) proposed a method that also uses a regression model to predict the scores, while the predictions are based on hidden representations generated using BERT (Devlin et al, 2019) as the encoder. However, these methods require ratings assigned by human annotators as training data which are also costly to obtain.…”

Section: Reference-free Metricsmentioning

confidence: 99%

“…We further conduct experiments to show the benefit of using our evaluator. A commonly used BERTbased evaluator is to add a linear regressor to the BERT representations (Xenouleas et al, 2019). We implement an evaluator (called BERT+Linear) that also uses a linear regressor to map the BERT embeddings of summaries into a score.…”

Section: Ablation Study For Evaluator Selectionmentioning

confidence: 99%

See 1 more Smart Citation

Unsupervised Reference-Free Summary Quality Evaluation via Contrastive Learning

Wu¹,

Ma²,

Wu³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Evaluation of a document summarization system has been a critical factor to impact the success of the summarization task. Previous approaches, such as ROUGE, mainly consider the informativeness of the assessed summary and require human-generated references for each test summary. In this work, we propose to evaluate the summary qualities without reference summaries by unsupervised contrastive learning. Specifically, we design a new metric which covers both linguistic qualities and semantic informativeness based on BERT. To learn the metric, for each summary, we construct different types of negative samples with respect to different aspects of the summary qualities, and train our model with a ranking loss. Experiments on Newsroom and CNN/Daily Mail demonstrate that our new evaluation method outperforms other metrics even without reference summaries. Furthermore, we show that our method is general and transferable across datasets.

show abstract

“…One possible route to a better automatic method for summary quality estimation is to train a model on document summaries annotated with human quality scores Nenkova, 2009, 2013;Xenouleas et al, 2019). Such a model could be used to evaluate summaries without further human involvement.…”

Section: Introductionmentioning

confidence: 99%

Fill in the BLANC: Human-free quality estimation of document summaries

Vasilyev¹,

Dharnidharka²,

Bohannon³

2020

Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems

View full text Add to dashboard Cite

We present BLANC, a new approach to the automatic estimation of document summary quality. Our goal is to measure the functional performance of a summary with an objective, reproducible, and fully automated method. Our approach achieves this by measuring the performance boost gained by a pretrained language model with access to a document summary while carrying out its language understanding task on the document's text. We present evidence that BLANC scores have as good correlation with human evaluations as do the ROUGE family of summary quality measurements. And unlike ROUGE, the BLANC method does not require human-written reference summaries, allowing for fully humanfree summary quality estimation.

show abstract

SUM-QE: a BERT-based Summary Quality Estimation Model

Cited by 36 publications

References 19 publications

SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics

SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics

Unsupervised Reference-Free Summary Quality Evaluation via Contrastive Learning

Fill in the BLANC: Human-free quality estimation of document summaries

Contact Info

Product

Resources

About