Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume 2021
DOI: 10.18653/v1/2021.eacl-main.44
|View full text |Cite
|
Sign up to set email alerts
|

Improving Factual Consistency Between a Response and Persona Facts

Abstract: Neural models for response generation produce responses that are semantically plausible but not necessarily factually consistent with facts describing the speaker's persona. These models are trained with fully supervised learning where the objective function barely captures factual consistency. We propose to finetune these models by reinforcement learning and an efficient reward function that explicitly captures the consistency between a response and persona facts as well as semantic plausibility 1 . Our autom… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(13 citation statements)
references
References 19 publications
(33 reference statements)
0
6
0
Order By: Relevance
“…As pointed out by Ranzato et al [142], word-level maximum likelihood training leads to the problem of exposure bias. Some research [3,73,84,102,120,135,163] adopt reinforcement learning to solve the hallucination problem, which utilizes different rewards to optimize the model. The reward is crucial and bottleneck of reinforcement learning and the approach to calculate reward score is related to exploring automatic metrics to evaluate the generated results.…”
Section: Trainingmentioning
confidence: 99%
See 1 more Smart Citation
“…As pointed out by Ranzato et al [142], word-level maximum likelihood training leads to the problem of exposure bias. Some research [3,73,84,102,120,135,163] adopt reinforcement learning to solve the hallucination problem, which utilizes different rewards to optimize the model. The reward is crucial and bottleneck of reinforcement learning and the approach to calculate reward score is related to exploring automatic metrics to evaluate the generated results.…”
Section: Trainingmentioning
confidence: 99%
“…By conditioning the response generation on the persona description, a chit-chat model is expected to require an ability to generate a more persona-consistent response. Lately, the application of NLI methods [100,159] or reinforcement learning frameworks [120] have been investigated. Although these conditioning methods using PersonaChat datasets are successful, further investigation of approaches that do not rely on the given set of persona descriptions is necessary because the former is not always available, and covering every aspect of persona with them is impossible.…”
Section: External Consistencymentioning
confidence: 99%
“…DialogNLI (Welleck et al, 2019b), Arun et al ( 2020), Ghazvininejad et al (2018) DECODE (Nie et al, 2021), CI-ToD (Qin et al, 2021) TransferTransfo (Mesgar et al, 2021), UL (Li et al, 2020a) Blender (Roller et al, 2021) DialogNLI (Welleck et al, 2019b), DECODE (Nie et al, 2021) KvBERT (Song et al, 2020a), RCDG TransferTransfo (Mesgar et al, 2021), UL (Li et al, 2020a) GDR (Song et al, 2020b) Structured Knowledge KvBERT (Song et al, 2020a), CI-ToD (Qin et al, 2021) (e.g. Knowledge Graph) NPH (Dziri et al, 2021a) User Query CI-ToD (Qin et al, 2021) (i.e.…”
Section: History Dialoguementioning
confidence: 99%
“…The consistency evaluation is based on an NLI classifier to compute the entailment score. Mesgar et al (2021) also propose an RL-based model TransferTransfo-RL for improving consistency between generated responses and personas. Differently, TransferTransfo-RL take the advantage of Actor-Critic (Mnih et al, 2016) learning approach, which also utilizes the entailment score as reward.…”
Section: Auxiliary Tasksmentioning
confidence: 99%
“…Existing works on building reliable dialog systems are generally divided into two categories: chit-chat open-domain dialog generation [38,72,73] and task-oriented dialog generation [79]. Attempts to open-domain dialog generation include generating more coherent [1,41,42], diverse [5,77], personalized [40,55] utterances. With the emergence of task-oriented datasets [7,17,66,74], more practice has been devoted to task-oriented dialog generation, which usually involves a pipeline of intent classification [67], dialog state tracking [25][26][27], dialog policy making [10,45] and dialog generation [15].…”
Section: Textual Dialog Generationmentioning
confidence: 99%