Proceedings of the 12th International Conference on Natural Language Generation 2019
DOI: 10.18653/v1/w19-8608
|View full text |Cite
|
Sign up to set email alerts
|

Towards Coherent and Engaging Spoken Dialog Response Generation Using Automatic Conversation Evaluators

Abstract: Encoder-decoder based neural architectures serve as the basis of state-of-the-art approaches in end-to-end open domain dialog systems. Since most of such systems are trained with a maximum likelihood (MLE) objective they suffer from issues such as lack of generalizability and the generic response problem, i.e., a system response that can be an answer to a large number of user utterances, e.g., "Maybe, I don't know." Having explicit feedback on the relevance and interestingness of a system response at each turn… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
20
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 24 publications
(22 citation statements)
references
References 41 publications
0
20
0
Order By: Relevance
“…Eventually, as Table 2 demonstrates, the mean κ agreement and mean Pearson correlation between evaluators participating in our experiments were 0.52 and 0.93. In the context of dialogue system evaluation where agreement is usually quite low (Venkatesh et al 2018;Ghandeharioun et al 2019;Yi et al 2019), these numbers show relatively high agreement between annotators. This provides evidence that engagement can be measured not only at the conversation level but also at the utterance level.…”
Section: Utterance-level Engagement Scoresmentioning
confidence: 99%
“…Eventually, as Table 2 demonstrates, the mean κ agreement and mean Pearson correlation between evaluators participating in our experiments were 0.52 and 0.93. In the context of dialogue system evaluation where agreement is usually quite low (Venkatesh et al 2018;Ghandeharioun et al 2019;Yi et al 2019), these numbers show relatively high agreement between annotators. This provides evidence that engagement can be measured not only at the conversation level but also at the utterance level.…”
Section: Utterance-level Engagement Scoresmentioning
confidence: 99%
“…Likability quantifies how much a set of one or more qualities makes a response more likable for a particular task. These qualities can be diversity (Li et al, 2016), sentiment (Rashkin et al, 2019), specificity (Ke et al, 2018), engagement (Yi et al, 2019), fluency (Kann et al, 2018) and more. A likable response may or may not be sensible to the context.…”
Section: Fundamental Aspectsmentioning
confidence: 99%
“…In this way, dialog systems could detect and react to user's disengagement in both open-domain dialogs (Yu et al, 2016) and taskoriented dialogs (Yu et al, 2017). During training, our model could also be used as real-time feedback to benefit dialog policy learning (Yi et al, 2019). Second, HERALD could quantify user engagement and be used as an automatic dialog evaluation metric.…”
Section: Introductionmentioning
confidence: 99%