Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2021
DOI: 10.18653/v1/2021.naacl-main.120
|View full text |Cite
|
Sign up to set email alerts
|

Generating Negative Samples by Manipulating Golden Responses for Unsupervised Learning of a Response Evaluation Model

Abstract: Evaluating the quality of responses generated by open-domain conversation systems is a challenging task. This is partly because there can be multiple appropriate responses to a given dialogue history. Reference-based metrics that rely on comparisons to a set of known correct responses often fail to account for this variety, and consequently correlate poorly with human judgment. To address this problem, researchers have investigated the possibility of assessing response quality without using a set of known corr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 18 publications
0
3
0
Order By: Relevance
“…A problem with the metrics learned with self-supervised learning is that the random negative-sampling strategy is likely to produce false-negative or over-simplistic candidates, thus introducing unwanted biases to the ADMs. One idea is to introduce adversarial irrelevant responses to increase the ADMs' discrimination capability (Sai et al 2020;Gupta, Tsvetkov, and Bigham 2021;Park et al 2021). In this way, the evaluation model will greatly benefit from a dataset of multiple relevant and adversarial irrelevant responses from diverse dialogue context.…”
Section: Related Work Dialogue Evaluation Metricsmentioning
confidence: 99%
“…A problem with the metrics learned with self-supervised learning is that the random negative-sampling strategy is likely to produce false-negative or over-simplistic candidates, thus introducing unwanted biases to the ADMs. One idea is to introduce adversarial irrelevant responses to increase the ADMs' discrimination capability (Sai et al 2020;Gupta, Tsvetkov, and Bigham 2021;Park et al 2021). In this way, the evaluation model will greatly benefit from a dataset of multiple relevant and adversarial irrelevant responses from diverse dialogue context.…”
Section: Related Work Dialogue Evaluation Metricsmentioning
confidence: 99%
“…Negative Samples Generation Module Ideal negative samples should be context-coherent, but personainconsistent. Motivated by (Park et al 2021), we first detect the key persona word(s) in a golden consistent response with a consistency score, then revise the selected word(s) to construct negative samples, as illustrated in Figure 3.…”
Section: Learning To Avoid Misidentificationmentioning
confidence: 99%
“…Replacement of Tokens Inspired by (Park et al, 2021), we manipulate tokens of the gold inference using the prediction of a masked language model. More specifically, we compute the probability of each token in the gold inference A when whole context X and A are given and when only A is given.…”
Section: Selection Of Negative Samplesmentioning
confidence: 99%