Call Classification with Hundreds of Classes and Hundred Thousands of Training Utterances ... ... and No Target Domain Data

David, Sina; Hunter, Peter; Pieraccini, Roberto

doi:10.1007/978-3-540-69369-7_10

Cited by 4 publications

(2 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The dialog system is a top-level call router with over 250 distinct call categories [7]. A set of 15 expert raters listened to approximately 100 calls each, and provided a CE rating for each call.…”

Section: Experimental Designmentioning

confidence: 99%

Caller Experience: A method for evaluating dialog systems and its automatic prediction

Evanini¹,

Hunter²,

Liscombe³

et al. 2008

2008 IEEE Spoken Language Technology Workshop

View full text Add to dashboard Cite

In this paper we introduce a subjective metric for evaluating the performance of spoken dialog systems, Caller Experience (CE). CE is a useful metric for tracking the overall performance of a system in deployment, as well as for isolating individual problematic calls in which the system under-performs. The proposed CE metric differs from most performance evaluation metrics proposed in the past in that it is a) a subjective, qualitative rating of the call, and b) provided by expert, external listeners, not the callers themselves. The results of an experiment in which a set of human experts listened to the same calls three times are presented. The fact that these results show a high level of agreement among different listeners, despite the subjective nature of the task, demonstrates the validity of using CE as a standard metric. Finally, an automated rating system using objective measures is shown to perform at the same high level as the humans. This is an important advance, since it provides a way to reduce the human labor costs associated with producing a reliable CE.

show abstract

Section: Experimental Designmentioning

confidence: 99%

Caller Experience: A method for evaluating dialog systems and its automatic prediction

Evanini¹,

Hunter²,

Liscombe³

et al. 2008

2008 IEEE Spoken Language Technology Workshop

View full text Add to dashboard Cite

show abstract

“…than 300,000 utterances according to 250 distinct classes (for details see [3]). Different annotators, however, have different opinions about how to label things-sometimes, it is deemed impossible to find a final agreement on the exact class where certain utterances belong due to differences in annotation styles.…”

Section: Correlationmentioning

confidence: 99%

C<sup>5</sup>

Suendermann¹,

Liscombe²,

Evanini³

et al. 2008

2008 IEEE Spoken Language Technology Workshop

View full text Add to dashboard Cite

COMPENDIUMThe annotation of hundreds of thousands of utterances for the training of statistical utterance classifiers requires a careful quality assurance procedure to make the data consistent and reliable. In this paper, we present five methods to analyze different aspects of annotated data to ensure their Completeness, Consistency, Correlation, Congruence and to avoid Confusion-collectively referred to as C 5 .

show abstract

From rule-based to statistical grammars: Continuous improvement of large-scale spoken dialog systems

David¹,

Evanini²,

Liscombe³

et al. 2009

2009 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

Call Classification with Hundreds of Classes and Hundred Thousands of Training Utterances ... ... and No Target Domain Data

Cited by 4 publications

References 3 publications

Caller Experience: A method for evaluating dialog systems and its automatic prediction

Caller Experience: A method for evaluating dialog systems and its automatic prediction

C<sup>5</sup>

From rule-based to statistical grammars: Continuous improvement of large-scale spoken dialog systems

Contact Info

Product

Resources

About