UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation

Guan, Jian; Huang, Minlie

doi:10.18653/v1/2020.emnlp-main.736

Cited by 26 publications

(35 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Lai and Tetreault (2018) designed SENTAVG that gets the sentence vectors from LSTM, takes the average of these vectors to represent the whole text, and then passes it through a hidden layer. Recently, Guan and Huang (2020) proposed a more accurate automatic evaluation metric called UNION. This metric achieved better performance by using BERT (Devlin et al, 2019) as a more effective classification model and have a broader set of negative samples coming from different heuristics.…”

Section: Related Workmentioning

confidence: 99%

“…Recently proposed top-k (Fan et al, 2018) and top-p (Holtzman et al, 2020) sampling techniques partially mitigated but not completely solved this issue. Guan and Huang (2020) proposed to replicate this problem in negative implausible text construction by repeating N-grams in consecutive positions. These heuristically constructed outputs only mirror local repetition issues, while the state-of-the-art generative models produce more complex and subtle repetitions throughout the whole text.…”

Section: Output Storymentioning

confidence: 99%

“…UNION. Recently, Guan and Huang (2020) proposed an automatic evaluation metric by training a BERT model (Devlin et al, 2019) with an auxiliary reconstruction objective which helps to recover the perturbation from a negative sample. The proposed model is trained on negative implausible texts constructed by adopting repetition, substitution, reordering, and negation sampling techniques.…”

Section: Baselinesmentioning

confidence: 99%

“…The performance of automatic evaluation metrics is assessed based on their correlations with human judgments. To this end, we gather human evaluations and examine the Spearman (ρ) and Kendall (τ ) correlations with metrics predicted scores (Newman et al, 2010;Lai and Tetreault, 2018;Guan and Huang, 2020). Spearman and Kendall are beneficial in estimating monotonic associations for not normally distributed and ranked scores.…”

Section: Human Annotationsmentioning

confidence: 99%

“…The choice of training data for learning such classifiers is a key determinant of the metric effectiveness. Existing works take humanwritten texts as plausible (positive) examples, while the negative samples are heuristically generated by randomly substituting keywords or sentences (See Figure 1) (Li and Jurafsky, 2016;Guan and Huang, 2020). Guan and Huang (2020) further improved the quality of evaluators by applying heuristic rules such as adding repetition, reordering and negation (See the UNION story in Figure 1).…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Plot-guided Adversarial Example Construction for Evaluating Open-domain Story Generation

Ghazarian¹,

Liu²,

Akash³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

Human Written Story: jenny liked fresh fish. she decided to go fishing to catch her own. she brought her worms and pole and a chair. she sat there all day but didn't catch anything. she packed it up and went home disappointed. Sentence Manipulation: jenny liked fresh fish. she decided to go fishing to catch her own. she wrote songs every single day. she sat there all day but didn't catch anything. she packed it up and went home disappointed. Keyword Manipulation: jenny liked fresh fish. she decided to go fishing to catch her own. she brought her worms and pole and a chair. she sat there all day but didn't catch anything. she unpacked it up and went home disappointed. UNION: jenny liked fresh fish. jim has a very structured workout program to help him achieve goals. she brought her worms and pole and a relaxer. she sat there all day but didn't catch anything. she unpack it up and went home disappointed.Plot: jenny fresh fish -> decided Manipulated Plot: jenny fresh fish -> tasha fishing catch -> brought worms chair offered woman store -> brought worms chair -> -> sat -> packed home disappointed sat -> got wet packed home disappointed Manipulated Plot Guided Generation (Ours): jenny was out of fresh fish. tasha offered to buy her some from the woman at the store. she brought her worms and a chair and decided to play with them. jenny sat down and laid down on the chair. when she got wet, she packed up and went home disappointed.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Output Storymentioning

confidence: 99%

Section: Baselinesmentioning

confidence: 99%

Section: Human Annotationsmentioning

confidence: 99%