Assessing user simulation for dialog systems using human judges and automatic evaluation measures

Ai, Hui‐wang; Litman, Diane

doi:10.1017/s1351324910000318

Cited by 6 publications

(5 citation statements)

References 28 publications

(45 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, some of the metrics are designed specifically for language generation evaluation, and as Liu et al (2016) pointed out, these automatic metrics barely correlate with human evaluation. Therefore, Ai and Litman (2011a) involved human judges to directly rate the simulated dialog. Schatzmann and Young (2009) asked humans to interact with the trained systems to perform indirect human evaluation.…”

Section: Related Workmentioning

confidence: 99%

How to Build User Simulators to Train RL-based Dialog Systems

Shi¹,

Qian²,

Wang³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

User simulators are essential for training reinforcement learning (RL) based dialog models. The performance of the simulator directly impacts the RL policy. However, building a good user simulator that models real user behaviors is challenging. We propose a method of standardizing user simulator building that can be used by the community to compare dialog system quality using the same set of user simulators fairly. We present implementations of six user simulators trained with different dialog planning and generation methods. We then calculate a set of automatic metrics to evaluate the quality of these simulators both directly and indirectly. We also ask human users to assess the simulators directly and indirectly by rating the simulated dialogs and interacting with the trained systems. This paper presents a comprehensive evaluation framework for user simulator study and provides a better understanding of the pros and cons of different user simulators, as well as their impacts on the trained systems. 1 * Equal contribution. 1 The code and data are released at https

show abstract

Section: Related Workmentioning

confidence: 99%

How to Build User Simulators to Train RL-based Dialog Systems

Shi¹,

Qian²,

Wang³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

show abstract

“…The models for detecting student states and for associating adaptive system strategies with such states were learned from tutoring dialogue corpora using new data-driven methods (Forbes-Riley and Litman 2011). To support the use of reinforcement learning as one of our data-driven techniques, we developed probabilistic user simulation models for our less goal-oriented tutoring domain (Ai and Litman 2011) and tailored the use of reinforcement learning with its differing state and reward representations to optimize the choice of pedagogical tutor behaviors (Chi et al 2011). A series of experimental evaluations demonstrated that our technologies for adapting to student uncertainty over and above answer correctness (Forbes-Riley and Litman 2011), as well as further adapting to student disengagement over and above uncertainty (Forbes-Riley and Litman 2012) could improve student learning and other measures of tutorial dialogue system performance.…”

Section: Teaching Using Languagementioning

confidence: 99%

Natural Language Processing for Enhancing Teaching and Learning

Litman

2016

AAAI

Self Cite

View full text Add to dashboard Cite

Advances in natural language processing (NLP) and educational technology, as well as the availability of unprecedented amounts of educationally-relevant text and speech data, have led to an increasing interest in using NLP to address the needs of teachers and students. Educational applications differ in many ways, however, from the types of applications for which NLP systems are typically developed. This paper will organize and give an overview of research in this area, focusing on opportunities as well as challenges.

show abstract

“…Ai and Litman (2008) propose to use human judges to evaluate automatically generated corpora. In this approach, human judges serve as a gold standard for user simulation assessment.…”

Section: State-of-the-art Metrics For Evaluating User Simulationsmentioning

confidence: 99%

“…The study reported in Ai and Litman (2008) is based on subjective questions asked to the human judges observing dialogues between a student and a tutor. It subsequently uses the scores provided by human judges to train different metrics with supervized learning methods (stepwise multiple linear regression and ranking models).…”

Section: State-of-the-art Metrics For Evaluating User Simulationsmentioning

confidence: 99%

A survey on metrics for the evaluation of user simulations

Pietquin

Hastie

2012

The Knowledge Engineering Review

View full text Add to dashboard Cite

International audienceUser simulation is an important research area in the field of spoken dialogue systems (SDSs) because collecting and annotating real human-machine interactions is often expensive and time-consuming. However, such data are generally required for designing, training and assessing dialogue systems. User simulations are especially needed when using machine learning methods for optimizing dialogue management strategies such as Reinforcement Learning, where the amount of data necessary for training is larger than existing corpora. The quality of the user simulation is therefore of crucial importance because it dramatically influences the results in terms of SDS performance analysis and the learnt strategy. Assessment of the quality of simulated dialogues and user simulation methods is an open issue and, although assessment metrics are required, there is no commonly adopted metric. In this paper, we give a survey of User Simulations Metrics in the literature, propose some extensions and discuss these metrics in terms of a list of desired features

show abstract

Assessing user simulation for dialog systems using human judges and automatic evaluation measures

Cited by 6 publications

References 28 publications

How to Build User Simulators to Train RL-based Dialog Systems

How to Build User Simulators to Train RL-based Dialog Systems

Natural Language Processing for Enhancing Teaching and Learning

A survey on metrics for the evaluation of user simulations

Contact Info

Product

Resources

About