2020
DOI: 10.48550/arxiv.2004.02990
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Evaluating the Evaluation of Diversity in Natural Language Generation

Abstract: Despite growing interest in natural language generation (NLG) models that produce diverse outputs, there is currently no principled method for evaluating the diversity of an NLG system. In this work, we propose a framework for evaluating diversity metrics. The framework measures the correlation between a proposed diversity metric and a diversity parameter, a single parameter that controls some aspect of diversity in generated text. For example, a diversity parameter might be a binary variable used to instruct … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(9 citation statements)
references
References 20 publications
(36 reference statements)
0
9
0
Order By: Relevance
“…We conclude that the low-diversity problem is mainly manifested in two aspects: form and content (Tevet & Berant, 2020;Fu et al, 2020;Holtzman et al, 2020). As shown Table 1, the low form diversity can be reflected in repeating some words, using similar lexicon and syntax, and more.…”
Section: Introductionmentioning
confidence: 70%
“…We conclude that the low-diversity problem is mainly manifested in two aspects: form and content (Tevet & Berant, 2020;Fu et al, 2020;Holtzman et al, 2020). As shown Table 1, the low form diversity can be reflected in repeating some words, using similar lexicon and syntax, and more.…”
Section: Introductionmentioning
confidence: 70%
“…It could be argued that a desirable captioning system is one that generates both plausible and diverse text. However, it has been documented that the diversity of generated text can be at odds with the performance of captioning systems (Dušek et al, 2020;Tevet & Berant, 2020). In previous sections, we made the case for plausibility.…”
Section: Quantifying Diversity Of Generated Multilingual Reportsmentioning
confidence: 99%
“…A single annotator completes each step to minimize cognitive load; rather than read and characterise a partial set of existing responses, an annotator must only reason about the set of responses they will write. Annotators are free to mix both surface and semantic diversity (Tevet and Berant 2020). We perform manual quality control by checking a sample of work from each annotator and conversation tree during each round of annotation.…”
Section: Conversation Turnsmentioning
confidence: 99%