Generating Negative Samples by Manipulating Golden Responses for Unsupervised Learning of a Response Evaluation Model

Park, ChaeHun; Jang, Eugene; Yang, Woosung; Park, Jong

doi:10.18653/v1/2021.naacl-main.120

Cited by 7 publications

(4 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A problem with the metrics learned with self-supervised learning is that the random negative-sampling strategy is likely to produce false-negative or over-simplistic candidates, thus introducing unwanted biases to the ADMs. One idea is to introduce adversarial irrelevant responses to increase the ADMs' discrimination capability (Sai et al 2020;Gupta, Tsvetkov, and Bigham 2021;Park et al 2021). In this way, the evaluation model will greatly benefit from a dataset of multiple relevant and adversarial irrelevant responses from diverse dialogue context.…”

Section: Related Work Dialogue Evaluation Metricsmentioning

confidence: 99%

MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation

Zhang

D’Haro

Friedrichs³

et al. 2022

AAAI

View full text Add to dashboard Cite

Chatbots are designed to carry out human-like conversations across different domains, such as general chit-chat, knowledge exchange, and persona-grounded conversations. To measure the quality of such conversational agents, a dialogue evaluator is expected to conduct assessment across domains as well. However, most of the state-of-the-art automatic dialogue evaluation metrics (ADMs) are not designed for multi-domain evaluation. We are motivated to design a general and robust framework, MDD-Eval, to address the problem. Specifically, we first train a teacher evaluator with human-annotated data to acquire a rating skill to tell good dialogue responses from bad ones in a particular domain and then, adopt a self-training strategy to train a new evaluator with teacher-annotated multi-domain data, that helps the new evaluator to generalize across multiple domains. MDD-Eval is extensively assessed on six dialogue evaluation benchmarks. Empirical results show that the MDD-Eval framework achieves a strong performance with an absolute improvement of 7% over the state-of-the-art ADMs in terms of mean Spearman correlation scores across all the evaluation benchmarks.

show abstract

Section: Related Work Dialogue Evaluation Metricsmentioning

confidence: 99%

MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation

Zhang

D’Haro

Friedrichs³

et al. 2022

AAAI

View full text Add to dashboard Cite

show abstract

“…Negative Samples Generation Module Ideal negative samples should be context-coherent, but personainconsistent. Motivated by (Park et al 2021), we first detect the key persona word(s) in a golden consistent response with a consistency score, then revise the selected word(s) to construct negative samples, as illustrated in Figure 3.…”

Section: Learning To Avoid Misidentificationmentioning

confidence: 99%

Learning to Know Myself: A Coarse-to-Fine Persona-Aware Training Framework for Personalized Dialogue Generation

Liu

Sun

et al. 2023

AAAI

View full text Add to dashboard Cite

A critical challenge for open-domain dialogue agents is to generate persona-relevant and consistent responses. Due to the nature of persona sparsity in conversation scenarios, previous persona-based dialogue agents trained with Maximum Likelihood Estimation tend to overlook the given personas and generate responses irrelevant or inconsistent with personas. To address this problem, we propose a two-stage coarse-to-fine persona-aware training framework to improve the persona consistency of a dialogue agent progressively. Specifically, our framework first trains the dialogue agent to answer the constructed persona-aware questions, making it highly sensitive to the personas to generate persona-relevant responses. Then the dialogue agent is further trained with a contrastive learning paradigm by explicitly perceiving the difference between the consistent and the generated inconsistent responses, forcing it to pay more attention to the key persona information to generate consistent responses. By applying our proposed training framework to several representative baseline models, experimental results show significant boosts on both automatic and human evaluation metrics, especially the consistency of generated responses.

show abstract

“…Replacement of Tokens Inspired by (Park et al, 2021), we manipulate tokens of the gold inference using the prediction of a masked language model. More specifically, we compute the probability of each token in the gold inference A when whole context X and A are given and when only A is given.…”

Section: Selection Of Negative Samplesmentioning

confidence: 99%

Contrastive Learning for Inference in Dialogue

Ishii,

Xu,

Wilie

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Inference, especially those derived from inductive processes, is a crucial component in our conversation to complement the information implicitly or explicitly conveyed by a speaker. While recent large language models show remarkable advances in inference tasks, their performance in inductive reasoning, where not all information is present in the context, is far behind deductive reasoning. In this paper, we analyze the behavior of the models based on the task difficulty defined by the semantic information gap -which distinguishes inductive and deductive reasoning (Johnson-Laird, 1988, 1993. Our analysis reveals that the disparity in information between dialogue contexts and desired inferences poses a significant challenge to the inductive inference process. To mitigate this information gap, we investigate a contrastive learning approach by feeding negative samples. Our experiments suggest negative samples help models understand what is wrong and improve their inference generations. 1

show abstract

Generating Negative Samples by Manipulating Golden Responses for Unsupervised Learning of a Response Evaluation Model

Cited by 7 publications

References 18 publications

MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation

MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation

Learning to Know Myself: A Coarse-to-Fine Persona-Aware Training Framework for Personalized Dialogue Generation

Contrastive Learning for Inference in Dialogue

Contact Info

Product

Resources

About