Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.736
|View full text |Cite
|
Sign up to set email alerts
|

UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation

Abstract: Despite the success of existing referenced metrics (e.g., BLEU and MoverScore), they correlate poorly with human judgments for openended text generation including story or dialog generation because of the notorious oneto-many issue: there are many plausible outputs for the same input, which may differ substantially in literal or semantics from the limited number of given references. To alleviate this issue, we propose UNION, a learnable UNreferenced metrIc for evaluating Open-eNded story generation, which meas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
34
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 26 publications
(35 citation statements)
references
References 29 publications
0
34
0
Order By: Relevance
“…Lai and Tetreault (2018) designed SENTAVG that gets the sentence vectors from LSTM, takes the average of these vectors to represent the whole text, and then passes it through a hidden layer. Recently, Guan and Huang (2020) proposed a more accurate automatic evaluation metric called UNION. This metric achieved better performance by using BERT (Devlin et al, 2019) as a more effective classification model and have a broader set of negative samples coming from different heuristics.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…Lai and Tetreault (2018) designed SENTAVG that gets the sentence vectors from LSTM, takes the average of these vectors to represent the whole text, and then passes it through a hidden layer. Recently, Guan and Huang (2020) proposed a more accurate automatic evaluation metric called UNION. This metric achieved better performance by using BERT (Devlin et al, 2019) as a more effective classification model and have a broader set of negative samples coming from different heuristics.…”
Section: Related Workmentioning
confidence: 99%
“…Recently proposed top-k (Fan et al, 2018) and top-p (Holtzman et al, 2020) sampling techniques partially mitigated but not completely solved this issue. Guan and Huang (2020) proposed to replicate this problem in negative implausible text construction by repeating N-grams in consecutive positions. These heuristically constructed outputs only mirror local repetition issues, while the state-of-the-art generative models produce more complex and subtle repetitions throughout the whole text.…”
Section: Output Storymentioning
confidence: 99%
See 3 more Smart Citations