2022
DOI: 10.48550/arxiv.2206.11249
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 0 publications
0
1
0
Order By: Relevance
“…On the other hand, generation benchmarks prompt an LM to produce a free-form response to a given prompt [5,21,26,73,80,144], and it is often unclear how to assess the quality of the output. Previous studies have measured the lexical or semantic similarity between the predicted free-form response and the reference answer to quantify the quality of the output [33,34,76,89,95,139,141]. However, the critical drawback is that it fails to identify false negatives, where the output is satisfactory but different from the reference answer [18,29,40,105].…”
Section: Related Workmentioning
confidence: 99%
“…On the other hand, generation benchmarks prompt an LM to produce a free-form response to a given prompt [5,21,26,73,80,144], and it is often unclear how to assess the quality of the output. Previous studies have measured the lexical or semantic similarity between the predicted free-form response and the reference answer to quantify the quality of the output [33,34,76,89,95,139,141]. However, the critical drawback is that it fails to identify false negatives, where the output is satisfactory but different from the reference answer [18,29,40,105].…”
Section: Related Workmentioning
confidence: 99%