2022
DOI: 10.48550/arxiv.2203.13927
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation

Abstract: Accurate automatic evaluation metrics for open-domain dialogs are in high demand. Existing model-based metrics for system response evaluation are trained on human annotated data, which is cumbersome to collect. In this work, we propose to use information that can be automatically extracted from the next user utterance, such as its sentiment or whether the user explicitly ends the conversation, as a proxy to measure the quality of the previous system response. This allows us to train on a massive set of dialogs… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(5 citation statements)
references
References 14 publications
0
5
0
Order By: Relevance
“…Another idea is to score a proposed system utterance by the probability that it elicits a particular user response type, such as disinterest or criticism [46,31,49,12].…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…Another idea is to score a proposed system utterance by the probability that it elicits a particular user response type, such as disinterest or criticism [46,31,49,12].…”
Section: Related Workmentioning
confidence: 99%
“…For example, the FED framework scores proposed utterances by utilizing the Di-aloGPT LM probabilities for the subsequent user utterances such as "That is interesting" [56,31]. Predictive models can also be generated from large dialogue corpora by using the following user utterance as weak supervision [12,46].…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations