Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1053
|View full text |Cite
|
Sign up to set email alerts
|

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance

Abstract: A robust evaluation metric has a profound impact on the development of text generation systems. A desirable metric compares system output against references based on their semantics rather than surface forms. In this paper we investigate strategies to encode system and reference texts to devise a metric that shows a high correlation with human judgment of text quality. We validate our new metric, namely MoverScore, on a number of text generation tasks including summarization, machine translation, image caption… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
263
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 316 publications
(310 citation statements)
references
References 46 publications
0
263
0
Order By: Relevance
“…5 For reference, we report the standard summarization baselines described in the previous section. The summaries are evaluated with 2 automatic evaluation metrics: ROUGE-2 recall with stopwords removed (R-2) (Lin, 2004) and a recent BERT-based evaluation metric (MOVER) (Zhao et al, 2019). The results, reported in Table 4, are encouraging since the systems based on the learned priors outperform the uniform prior.…”
Section: Extracting Summaries: Examplementioning
confidence: 92%
“…5 For reference, we report the standard summarization baselines described in the previous section. The summaries are evaluated with 2 automatic evaluation metrics: ROUGE-2 recall with stopwords removed (R-2) (Lin, 2004) and a recent BERT-based evaluation metric (MOVER) (Zhao et al, 2019). The results, reported in Table 4, are encouraging since the systems based on the learned priors outperform the uniform prior.…”
Section: Extracting Summaries: Examplementioning
confidence: 92%
“…MoverScore employs a contextualized embedding model and a variant of the Earth Mover Distance (Rubner et al, 2000) to measure the similarity between sentence-pairs (Zhao et al, 2019). Given two sentences, MoverScore aligns similar words from each sentence and computes the flow traveling between these words.…”
Section: Baseline Modelsmentioning
confidence: 99%
“…Lin proposed a framework based on global encoding, which used convolution gating unit to control the information flow from the encoder to decoder according to the global information of input context. Wei [Wei et al 2019] proposed a regularization approach to the sequence-to-sequence model for the Chinese social media summarization task, which could improve the semantic consistency. Based on a double attention pointer network, Li [Li et al 2020] proposed an encoder-decoder model achieved higher summarization performance on the CNN/Daily Mail dataset and the LCSTS dataset.…”
Section: Related Workmentioning
confidence: 99%
“…Global Encoding for Long Chinese Text Summarization 84:11 Some evaluation methods (ROUGE [Lin and Hovy 2003], BLEU [Papineni et al 2002], Mover-Score [Zhao et al 2019]) are adopted in text summarization. As we all know, the evaluation of large-scale summarization models is costly and cumbersome.…”
Section: Settingmentioning
confidence: 99%
See 1 more Smart Citation