Proceedings of the SIGDIAL 2009 Conference on the 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue - 2009
DOI: 10.3115/1708376.1708428
|View full text |Cite
|
Sign up to set email alerts
|

A handsome set of metrics to measure utterance classification performance in spoken dialog systems

Abstract: We present a set of metrics describing classification performance for individual contexts of a spoken dialog system as well as for the entire system. We show how these metrics can be used to train and tune system components and how they are related to Caller Experience, a subjective measure describing how well a caller was treated by the dialog system.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
4
0

Year Published

2011
2011
2020
2020

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 8 publications
0
4
0
Order By: Relevance
“…Our experiment design aims to show that there is a statistical correlation between the presence of "Unazuki" nodding and change of a classic quantitative metrics used to measure the quality of conversational agents, called utterance amount for testing H1. These metrics measure the length of dialogue sessions and have been shown to be well related to Caller Experience, a subjective measure describing how well a human interlocutor was treated by the dialog system [30]. The quantitative evaluation based on the utterance amount is illustrated in Fig.…”
Section: Measurements Of Dependent Variablesmentioning
confidence: 99%
See 1 more Smart Citation
“…Our experiment design aims to show that there is a statistical correlation between the presence of "Unazuki" nodding and change of a classic quantitative metrics used to measure the quality of conversational agents, called utterance amount for testing H1. These metrics measure the length of dialogue sessions and have been shown to be well related to Caller Experience, a subjective measure describing how well a human interlocutor was treated by the dialog system [30]. The quantitative evaluation based on the utterance amount is illustrated in Fig.…”
Section: Measurements Of Dependent Variablesmentioning
confidence: 99%
“…Fig. 7 illustrates the estimation of the conversational agent performance based on utterance mount [30]. The corresponding survey's settings are shown in Table 2.…”
Section: Measurements Of Dependent Variablesmentioning
confidence: 99%
“…In addition, numerous other aspects of the spoken dialog system can be "learned" for a specific task, such as application-specific grammars [64,65], prompt wording [16,45], choice of text-to-speech audio [11], and others. Learning in these areas can certainly improve performance of a spoken dialog system, but is separate from the dialog management task.…”
Section: Related Workmentioning
confidence: 99%
“…Crowdsourcing is one solution that allows us to overcome this obstacle of obtaining data rapidly for iterative model building and refinement. Crowdsourcing has been used to rapidly and cheaply obtain data for a number of spoken language applications in recent years, such as native (Suendermann‐Oeft, Liscombe, & Pieraccini, ) and nonnative (Evanini, Higgins, & Zechner, ) speech transcription and evaluation of quality of speech synthesizers (Buchholz & Latorre, ; Wolters, Isaac, & Renals, ). Crowdsourcing, and particularly Amazon Mechanical Turk, has also been used for assessing SDSs and for collecting interactions with SDSs.…”
mentioning
confidence: 99%