2011
DOI: 10.1017/s1351324910000318
|View full text |Cite
|
Sign up to set email alerts
|

Assessing user simulation for dialog systems using human judges and automatic evaluation measures

Abstract: While different user simulations are built to assist dialog system development, there is an increasing need to quickly assess the quality of the user simulations reliably. Previous studies have proposed several automatic evaluation measures for this purpose. However, the validity of these evaluation measures has not been fully proven. We present an assessment study in which human judgments are collected on user simulation qualities as the gold standard to validate automatic evaluation measures. We show that a … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
4
0

Year Published

2012
2012
2021
2021

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 28 publications
(45 reference statements)
0
4
0
Order By: Relevance
“…However, some of the metrics are designed specifically for language generation evaluation, and as Liu et al (2016) pointed out, these automatic metrics barely correlate with human evaluation. Therefore, Ai and Litman (2011a) involved human judges to directly rate the simulated dialog. Schatzmann and Young (2009) asked humans to interact with the trained systems to perform indirect human evaluation.…”
Section: Related Workmentioning
confidence: 99%
“…However, some of the metrics are designed specifically for language generation evaluation, and as Liu et al (2016) pointed out, these automatic metrics barely correlate with human evaluation. Therefore, Ai and Litman (2011a) involved human judges to directly rate the simulated dialog. Schatzmann and Young (2009) asked humans to interact with the trained systems to perform indirect human evaluation.…”
Section: Related Workmentioning
confidence: 99%
“…The models for detecting student states and for associating adaptive system strategies with such states were learned from tutoring dialogue corpora using new data-driven methods (Forbes-Riley and Litman 2011). To support the use of reinforcement learning as one of our data-driven techniques, we developed probabilistic user simulation models for our less goal-oriented tutoring domain (Ai and Litman 2011) and tailored the use of reinforcement learning with its differing state and reward representations to optimize the choice of pedagogical tutor behaviors (Chi et al 2011). A series of experimental evaluations demonstrated that our technologies for adapting to student uncertainty over and above answer correctness (Forbes-Riley and Litman 2011), as well as further adapting to student disengagement over and above uncertainty (Forbes-Riley and Litman 2012) could improve student learning and other measures of tutorial dialogue system performance.…”
Section: Teaching Using Languagementioning
confidence: 99%
“…Ai and Litman (2008) propose to use human judges to evaluate automatically generated corpora. In this approach, human judges serve as a gold standard for user simulation assessment.…”
Section: State-of-the-art Metrics For Evaluating User Simulationsmentioning
confidence: 99%
“…The study reported in Ai and Litman (2008) is based on subjective questions asked to the human judges observing dialogues between a student and a tutor. It subsequently uses the scores provided by human judges to train different metrics with supervized learning methods (stepwise multiple linear regression and ranking models).…”
Section: State-of-the-art Metrics For Evaluating User Simulationsmentioning
confidence: 99%