Bayesian reinforcement learning for POMDP-based dialogue systems

Png, Shaowei; Pineau, Joëlle

doi:10.1109/icassp.2011.5946754

Cited by 17 publications

(13 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…However, this would cause a problem of partial observability. This could be handled by the POMDP (Partially Observed MDP) framework, but this is much more complex [25]. Instead of using this approach, we consider here the Information State Paradigm [26].…”

Section: B Casting the Dm Problem As An Mdpmentioning

confidence: 99%

A Comprehensive Reinforcement Learning Framework for Dialogue Management Optimization

Daubigney

Geist

Chandramohan

et al. 2012

IEEE J. Sel. Top. Signal Process.

View full text Add to dashboard Cite

Reinforcement learning is now an acknowledged approach for optimizing the interaction strategy of spoken dialogue systems. If the first considered algorithms were quite basic (like SARSA), recent works concentrated on more sophisticated methods. More attention has been paid to off-policy learning, dealing with the exploration-exploitation dilemma, sample efficiency or handling non-stationarity. New algorithms have been proposed to address these issues and have been applied to dialogue management. However, each algorithm often solves a single issue at a time, while dialogue systems exhibit all the problems at once. In this paper, we propose to apply the Kalman Temporal Differences (KTD) framework to the problem of dialogue strategy optimization so as to address all these issues in a comprehensive manner with a single framework. Our claims are illustrated by experiments led on two real-world goal-oriented dialogue management frameworks, DIPPER and HIS.Index Terms-Dialogue management, reinforcement learning, spoken dialogue system.

show abstract

Section: B Casting the Dm Problem As An Mdpmentioning

confidence: 99%

A Comprehensive Reinforcement Learning Framework for Dialogue Management Optimization

Daubigney

Geist

Chandramohan

et al. 2012

IEEE J. Sel. Top. Signal Process.

View full text Add to dashboard Cite

show abstract

“…The challenge in this case is to find good bounds; this is especially difficult given the uncertainty over the underlying model. The method has been used in the context of partially observable BAMDP [Png andPineau, 2011, Png, 2011] using a naive heuristic, D d=0 γ d R max , where D is the search depth and R max is the maximum reward. The method was applied successfully to solve simulated dialogue management problems; computational scalability was achieved via a number of structural constraints, including the parameter tying method proposed by .…”

Section: Branch and Bound Searchmentioning

confidence: 99%

“…The planning approach suggested by Ross et al [2011] aims to approximate the optimal BAPOMDP strategy by employing a forward search similar to that outlined in Algorithm 2. In related work, Png and Pineau [2011] use a branch-and-bound algorithm to approximate the BAPOMDP solution. Many of the other techniques outlined in Table 4.1 could also be extended to the BAPOMDP model.…”

Section: Extensions To Partially Observable Mdpsmentioning

confidence: 99%

“…The simplest case, called parameter tying in previous sections, corresponds to the case where states are grouped according to a pre-defined clustering assignment. In this case, it is common to aggregate learned parameters according to this assignment , Sorg et al, 2010, Png and Pineau, 2011. The advantage is that there are fewer parameters to estimate, and thus, learning can be achieved with fewer samples.…”

Section: Extensions To Other Priors and Structured Mdpsmentioning

confidence: 99%

See 1 more Smart Citation

Convex Optimization: Algorithms and Complexity

Ghavamzadeh

Mannor

Pineau

et al. 2015

FNT in Machine Learning

Self Cite

159

View full text Add to dashboard Cite

Bayesian methods for machine learning have been widely investigated, yielding principled methods for incorporating prior information into inference algorithms. In this survey, we provide an in-depth review of the role of Bayesian methods for the reinforcement learning (RL) paradigm. The major incentives for incorporating Bayesian reasoning in RL are: 1) it provides an elegant approach to action-selection (exploration/exploitation) as a function of the uncertainty in learning; and 2) it provides a machinery to incorporate prior knowledge into the algorithms. We first discuss models and methods for Bayesian inference in the simple single-step Bandit model. We then review the extensive recent literature on Bayesian methods for model-based RL, where prior information can be expressed on the parameters of the Markov model. We also present Bayesian methods for model-free RL, where priors are expressed over the value function or policy class. The objective of the paper is to provide a comprehensive survey on Bayesian RL algorithms and their theoretical and empirical properties.

show abstract

“…However, the lack of tools for easy prototyping of newer models remains an impediment to developing new models and properly benchmarking against previous models. Furthermore, the different types of conversational agents-e.g., generative (Hochreiter and Schmidhuber, 1997;Serban et al, 2015Serban et al, , 2016, retrieval-based (Schatzmann et al, 2005a;Lowe et al, 2015a), slot-based (Young, 2006) or POMDP agents (Png and Pineau, 2011)have different working mechanisms, which pose challenges to the development of a unified platform for conversational agents with multi-domain support.…”

Section: Introductionmentioning

confidence: 99%

MACA: A Modular Architecture for Conversational Agents

Truong¹,

Parthasarathi²,

Pineau³

2017

Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

Self Cite

View full text Add to dashboard Cite

We propose a software architecture designed to ease the implementation of dialogue systems. The Modular Architecture for Conversational Agents (MACA) uses a plug-n-play style that allows quick prototyping, thereby facilitating the development of new techniques and the reproduction of previous work. The architecture separates the domain of the conversation from the agent's dialogue strategy, and as such can be easily extended to multiple domains. MACA provides tools to host dialogue agents on Amazon Mechanical Turk (mTurk) for data collection and allows processing of other sources of training data. The current version of the framework already incorporates several domains and existing dialogue strategies from the recent literature.

show abstract

Bayesian reinforcement learning for POMDP-based dialogue systems

Cited by 17 publications

References 17 publications

A Comprehensive Reinforcement Learning Framework for Dialogue Management Optimization

A Comprehensive Reinforcement Learning Framework for Dialogue Management Optimization

Convex Optimization: Algorithms and Complexity

MACA: A Modular Architecture for Conversational Agents

Contact Info

Product

Resources

About