2008 IEEE Spoken Language Technology Workshop 2008
DOI: 10.1109/slt.2008.4777857
|View full text |Cite
|
Sign up to set email alerts
|

Caller Experience: A method for evaluating dialog systems and its automatic prediction

Abstract: In this paper we introduce a subjective metric for evaluating the performance of spoken dialog systems, Caller Experience (CE). CE is a useful metric for tracking the overall performance of a system in deployment, as well as for isolating individual problematic calls in which the system under-performs. The proposed CE metric differs from most performance evaluation metrics proposed in the past in that it is a) a subjective, qualitative rating of the call, and b) provided by expert, external listeners, not the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

2
12
0

Year Published

2009
2009
2020
2020

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 15 publications
(14 citation statements)
references
References 9 publications
2
12
0
Order By: Relevance
“…M枚ller and Ward [9] further proposed a tripartite framework to evaluate SDS: "One part models the behavior of user and system during the interaction, the second one the perception and judgment processes taking place inside the user, and the third part models what matters to system designers and service providers." Similar to our work, Evanini et al [3] used a decision tree to predict caller experience, and Engelbrech et al [2] use Hidden Markov Models to predict the user judgements. However, the main differences from our work are: (1) we utilize crowdsourcing rather than experts [3] or true users [2] to get more annotated dialogs; (2) we leverage SSL to predict user evaluation, which is more suitable when the amount of labeled data is small.…”
Section: Related Workmentioning
confidence: 85%
See 1 more Smart Citation
“…M枚ller and Ward [9] further proposed a tripartite framework to evaluate SDS: "One part models the behavior of user and system during the interaction, the second one the perception and judgment processes taking place inside the user, and the third part models what matters to system designers and service providers." Similar to our work, Evanini et al [3] used a decision tree to predict caller experience, and Engelbrech et al [2] use Hidden Markov Models to predict the user judgements. However, the main differences from our work are: (1) we utilize crowdsourcing rather than experts [3] or true users [2] to get more annotated dialogs; (2) we leverage SSL to predict user evaluation, which is more suitable when the amount of labeled data is small.…”
Section: Related Workmentioning
confidence: 85%
“…Similar to our work, Evanini et al [3] used a decision tree to predict caller experience, and Engelbrech et al [2] use Hidden Markov Models to predict the user judgements. However, the main differences from our work are: (1) we utilize crowdsourcing rather than experts [3] or true users [2] to get more annotated dialogs; (2) we leverage SSL to predict user evaluation, which is more suitable when the amount of labeled data is small.…”
Section: Related Workmentioning
confidence: 85%
“…Systems can also adapt to the user environment, as in the case of Ambient Intelligence applications [25,169]. A more sophisticated approach is to adapt the system to the user specifi knowledge and expertise, in which case the main research topics are the adaptation of systems to proficien y in the interaction language [112], age [161], different user expertise levels [49], and special needs [96]. Despite their complexity, these characteristics are to some extent rather static.…”
Section: Related Workmentioning
confidence: 99%
“…The first kind asks the user to give a numerical rating and/or fill in a questionnaire about the dialogue after its end [2,3,4]. The second one involves experts instead of users [5,6]. In both cases, ratings are meant for system performance tracking and if necessary, system behaviour adaptation.…”
Section: Introductionmentioning
confidence: 99%