The Hidden Agenda User Simulation Model

Schatzmann, Jost; Young, Steve

doi:10.1109/tasl.2008.2012071

Cited by 79 publications

(58 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Policy learning was implemented by a Gaussian process via the GP-SARSA algorithm [16]. As policies typically require O(10 4 ) dialogues to converge, a simulated user [17] (operating at the semantic level) was used prior to the concluding human user trial. The natural language understanding and generation components used for human user trials were both hand crafted, using a Phoenix grammar [18] and templates (mapping system semantics to natural language) respectively.…”

Section: Dialogue System Descriptionmentioning

confidence: 99%

“…The size of this feature implicitly gives an indication of the cardinality ranges of the slots; both SFR and LAP have 6 'informable' (from the users perspective) slots, but the raw SFR feature is much larger as SFR contains slots with a greater number of possible values. All datasets for training RNN dialogue success models are obtained from training policies from random with a user simulator [17], producing supervised pairs: (sequence of turn level dialogue features, objective success/failure target label). The semantic error rate (SER) of the simulated user is set to 15% and data is balanced regarding target labels.…”

Section: Domains and Datasetsmentioning

confidence: 99%

See 1 more Smart Citation

Multi-domain dialogue success classifiers for policy training

Vandyke

Gašić

et al. 2015

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

Self Cite

View full text Add to dashboard Cite

We propose a method for constructing dialogue success classifiers that are capable of making accurate predictions in domains unseen during training. Pooling and adaptation are also investigated for constructing multi-domain models when data is available in the new domain. This is achieved by reformulating the features input to the recurrent neural network models introduced in [1]. Importantly, on our task of main interest, this enables policy training in a new domain without the dialogue success classifier (which forms the reinforcement learning reward function) ever having seen data from that domain before. This occurs whilst incurring only a small reduction in performance relative to developing and using an in-domain dialogue success classifier. Finally, given the motivation with these dialogue success classifiers is to enable policy training with real users, we demonstrate that these initial policy training results obtained with a simulated user carry over to learning from paid human users.Index Terms-statistical spoken dialogue systems, dialogue success, multi-domain, policy training

show abstract

Section: Dialogue System Descriptionmentioning

confidence: 99%

Section: Domains and Datasetsmentioning

confidence: 99%

Multi-domain dialogue success classifiers for policy training

Vandyke

Gašić

et al. 2015

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

Self Cite

View full text Add to dashboard Cite

show abstract

“…At every turn t, input feature f t are extracted from the belief/action pair and used to update the hidden layer h t . From dialogues generated by a simulated user (Schatzmann and Young, 2009) supervised training pairs are created which consist of the turn level sequence of these feature vectors f t along with the scalar dialogue return as scored by an objective measure of task completion. Whilst the RNN models are trained on dialogue level supervised targets, we hypothesise that their subsequent turn level predictions can guide policy exploration via acting as informative reward shaping potentials.…”

Section: Reward Shaping With Rnn Predictionmentioning

confidence: 99%

Reward Shaping with Recurrent Neural Networks for Speeding up On-Line Policy Learning in Spoken Dialogue Systems

Su¹,

Vandyke²,

Gašić³

et al. 2015

Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Self Cite

View full text Add to dashboard Cite

Statistical spoken dialogue systems have the attractive property of being able to be optimised from data via interactions with real users. However in the reinforcement learning paradigm the dialogue manager (agent) often requires significant time to explore the state-action space to learn to behave in a desirable manner. This is a critical issue when the system is trained on-line with real users where learning costs are expensive. Reward shaping is one promising technique for addressing these concerns. Here we examine three recurrent neural network (RNN) approaches for providing reward shaping information in addition to the primary (task-orientated) environmental feedback. These RNNs are trained on returns from dialogues generated by a simulated user and attempt to diffuse the overall evaluation of the dialogue back down to the turn level to guide the agent towards good behaviour faster. In both simulated and real user scenarios these RNNs are shown to increase policy learning speed. Importantly, they do not require prior knowledge of the user's goal.

show abstract

“…In dialogue management, databased models use probabilistic models to define the dialogue strategy, i.e., given the dialogue history (including last user interaction) and, possibly, other environmental factors, they are given as inputs to a probabilistic model that provides as a result which is the next action to be performed by the system. In the last decade, the most used probabilistic models have been Markov Decision Processes (MDP) [19], [20] and Partially Observable MDP (POMDP) [21], [22], [23].…”

Section: A Dialogue Modelsmentioning

confidence: 99%

Unsegmented Dialogue Act Annotation and Decoding with N-Gram Transducers

Martínez-Hinarejos

Benedí

Tamarit

2014

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

The statistical models used for dialogue systems need annotated data (dialogues) to infer their statistical parameters. Dialogues are usually annotated in terms of Dialogue Acts (DA). The annotation problem can be attacked with statistical models, that avoid annotating the dialogues from scratch. Most previous works on automatic statistical annotation assume that the dialogue turns are segmented into the corresponding meaningful units. However, this segmentation is not usually available. Most recent works tried the annotation with unsegmented turns using an extension of the models used in the segmented case, but they showed a dramatical decrease in their performance. In this work we propose an enhanced annotation technique based on N-gram transducers that outperforms the accuracy of the classical HMM-based model for annotation and segmentation of unsegmented turns.

show abstract

The Hidden Agenda User Simulation Model

Cited by 79 publications

References 23 publications

Multi-domain dialogue success classifiers for policy training

Multi-domain dialogue success classifiers for policy training

Reward Shaping with Recurrent Neural Networks for Speeding up On-Line Policy Learning in Spoken Dialogue Systems

Unsegmented Dialogue Act Annotation and Decoding with N-Gram Transducers

Contact Info

Product

Resources

About