2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2011
DOI: 10.1109/icassp.2011.5946754
|View full text |Cite
|
Sign up to set email alerts
|

Bayesian reinforcement learning for POMDP-based dialogue systems

Abstract: Spoken dialogue systems are gaining popularity with improvements in speech recognition technologies. Dialogue systems can be modeled effectively using POMDPs, achieving improvements in robustness. However, past research on POMDPs-based dialogue system assumes that the model parameters are known. This limitation can be addressed through model-based Bayesian reinforcement learning, which offers a rich framework for simultaneous learning and planning. However, due to the high complexity of the framework, a major … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0
1

Year Published

2012
2012
2017
2017

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 17 publications
(13 citation statements)
references
References 17 publications
0
12
0
1
Order By: Relevance
“…However, this would cause a problem of partial observability. This could be handled by the POMDP (Partially Observed MDP) framework, but this is much more complex [25]. Instead of using this approach, we consider here the Information State Paradigm [26].…”
Section: B Casting the Dm Problem As An Mdpmentioning
confidence: 99%
“…However, this would cause a problem of partial observability. This could be handled by the POMDP (Partially Observed MDP) framework, but this is much more complex [25]. Instead of using this approach, we consider here the Information State Paradigm [26].…”
Section: B Casting the Dm Problem As An Mdpmentioning
confidence: 99%
“…The challenge in this case is to find good bounds; this is especially difficult given the uncertainty over the underlying model. The method has been used in the context of partially observable BAMDP [Png andPineau, 2011, Png, 2011] using a naive heuristic, D d=0 γ d R max , where D is the search depth and R max is the maximum reward. The method was applied successfully to solve simulated dialogue management problems; computational scalability was achieved via a number of structural constraints, including the parameter tying method proposed by .…”
Section: Branch and Bound Searchmentioning
confidence: 99%
“…The planning approach suggested by Ross et al [2011] aims to approximate the optimal BAPOMDP strategy by employing a forward search similar to that outlined in Algorithm 2. In related work, Png and Pineau [2011] use a branch-and-bound algorithm to approximate the BAPOMDP solution. Many of the other techniques outlined in Table 4.1 could also be extended to the BAPOMDP model.…”
Section: Extensions To Partially Observable Mdpsmentioning
confidence: 99%
See 1 more Smart Citation
“…However, the lack of tools for easy prototyping of newer models remains an impediment to developing new models and properly benchmarking against previous models. Furthermore, the different types of conversational agents-e.g., generative (Hochreiter and Schmidhuber, 1997;Serban et al, 2015Serban et al, , 2016, retrieval-based (Schatzmann et al, 2005a;Lowe et al, 2015a), slot-based (Young, 2006) or POMDP agents (Png and Pineau, 2011)have different working mechanisms, which pose challenges to the development of a unified platform for conversational agents with multi-domain support.…”
Section: Introductionmentioning
confidence: 99%