2019
DOI: 10.48550/arxiv.1909.02622
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
50
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
4

Relationship

0
10

Authors

Journals

citations
Cited by 33 publications
(51 citation statements)
references
References 38 publications
1
50
0
Order By: Relevance
“…This suggests that the sketching step helps generate a more fluent summary even with lower unigram matching. Furthermore, recognizing the limitation of ROUGE scores in their ability to fully capture the resemblance between the generated summary and the reference, in Table 2, we follow (Fabbri et al, 2020) rics, including ROUGE-Word Embedding (Ng and Abrecht, 2015), BERTScore (Zhang et al, 2019b), MoverScore (Zhao et al, 2019), Sentence Mover's Similarity (SMS) (Clark et al, 2019), BLEU (Papineni et al, 2002), and CIDEr (Vedantam et al, 2015). As shown in Table 2, CODS consistently outperforms PEGASUS and BART.…”
Section: Resultsmentioning
confidence: 94%
“…This suggests that the sketching step helps generate a more fluent summary even with lower unigram matching. Furthermore, recognizing the limitation of ROUGE scores in their ability to fully capture the resemblance between the generated summary and the reference, in Table 2, we follow (Fabbri et al, 2020) rics, including ROUGE-Word Embedding (Ng and Abrecht, 2015), BERTScore (Zhang et al, 2019b), MoverScore (Zhao et al, 2019), Sentence Mover's Similarity (SMS) (Clark et al, 2019), BLEU (Papineni et al, 2002), and CIDEr (Vedantam et al, 2015). As shown in Table 2, CODS consistently outperforms PEGASUS and BART.…”
Section: Resultsmentioning
confidence: 94%
“…Baselines include BLEURT, described in Section 2.3), along with BERTScore, a non-learned neural metric that uses a matching algorithm on top of neural word embeddings, similar to n-gram matching approaches. MoverScore [42] is similar to BERTScore, but uses an optimal transport algorithm. BLEU, ROUGE, METEOR and chrF++ are widely used n-gram-based methods, working at the word, subword or character level.…”
Section: Resultsmentioning
confidence: 99%
“…We use automatic metrics BLEU (Papineni et al, 2002), METEOR (Denkowski and Lavie, 2014) and a neural-based metric MoverScore (Zhao et al, 2019). As automatic scores remain tricky for correctly evaluating the text quality, we conduct human evaluation.…”
Section: Evaluation Metricsmentioning
confidence: 99%