Model-Based Online Learning of POMDPs

Shani, Guy; Brafman, Ronen I.; Shimony, Solomon Eyal

doi:10.1007/11564096_35

Cited by 72 publications

(68 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…On the other hand, online approaches reduce the complexity of the problem by planning online for only the current information state [17,18,19]. It considers only a small horizon of possible scenarios.…”

Section: Pomdpsmentioning

confidence: 99%

Bayesian reinforcement learning for POMDP-based dialogue systems

Png

Pineau

2011

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Spoken dialogue systems are gaining popularity with improvements in speech recognition technologies. Dialogue systems can be modeled effectively using POMDPs, achieving improvements in robustness. However, past research on POMDPs-based dialogue system assumes that the model parameters are known. This limitation can be addressed through model-based Bayesian reinforcement learning, which offers a rich framework for simultaneous learning and planning. However, due to the high complexity of the framework, a major challenge is to scale up these algorithms for complex dialogue systems. In this work, we show that by exploiting certain known components of the system, such as knowledge of symmetrical properties, and using an approximate online planning algorithm, we are able to apply Bayesian RL on a realistic spoken dialogue system domain.

show abstract

Section: Pomdpsmentioning

confidence: 99%

Bayesian reinforcement learning for POMDP-based dialogue systems

Png

Pineau

2011

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…With the above two ideas, U-Tree can be further improved by making use of POMDP belief state based value in place of the MDP Q-value iteration. A similar approach was taken by Guy Shani et al [31], in which they proposed an extension of McCallum's Utile Suffix Memory [32] that makes use of the sensor reliability statistics and a modified version of Perseus [12] point-based belief state value iteration. However, their statistical approach of obtaining the state observation probabilities does not seem to be justified.…”

mentioning

confidence: 92%

A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems

Zheng

Cho

2011

Neural Process Lett

View full text Add to dashboard Cite

Partially observable Markov decision processes (POMDP) provide a mathematical framework for agent planning under stochastic and partially observable environments. The classic Bayesian optimal solution can be obtained by transforming the problem into Markov decision process (MDP) using belief states. However, because the belief state space is continuous and multi-dimensional, the problem is highly intractable. Many practical heuristic based methods are proposed, but most of them require a complete POMDP model of the environment, which is not always practical. This article introduces a modified memory-based reinforcement learning algorithm called modified U-Tree that is capable of learning from raw sensor experiences with minimum prior knowledge. This article describes an enhancement of the original U-Tree's state generation process to make the generated model more compact, and also proposes a modification of the statistical test for reward estimation, which allows the algorithm to be benchmarked against some traditional model-based algorithms with a set of well known POMDP problems.Keywords Memory-based reinforcement learning · Markov decision processes · Partially observable Markov decision processes · Reinforcement learning BackgroundOne of the characteristics of intelligence is the capability of making the right decisions under various circumstances. Therefore, a large part of the artificial intelligence (AI) research is devoted to decision-making problem. There are many approaches to formalize the problem. Early works of AI call it planning. The basic approach is to capture the prior knowledge of the environment as logic propositions and then relies on logic inference to make decisions.

show abstract

“…The model-based approach is an important branch of POMDP research (Sallans, 2000;Theocharous, 2002;Shani et al, 2005). In this approach, the environment is assumed to be unknown to the agent, and the agent learns the model of the environment through experience (i.e., the history of actions and observations).…”

Section: Introductionmentioning

confidence: 99%

Bottom-up learning of hierarchical models in a class of deterministic POMDP environments

Fukumoto

Wakuya

Furukawa

2015

International Journal of Applied Mathematics and Computer Science

View full text Add to dashboard Cite

The theory of partially observable Markov decision processes (POMDPs) is a useful tool for developing various intelligent agents, and learning hierarchical POMDP models is one of the key approaches for building such agents when the environments of the agents are unknown and large. To learn hierarchical models, bottom-up learning methods in which learning takes place in a layer-by-layer manner from the lowest to the highest layer are already extensively used in some research fields such as hidden Markov models and neural networks. However, little attention has been paid to bottom-up approaches for learning POMDP models. In this paper, we present a novel bottom-up learning algorithm for hierarchical POMDP models and prove that, by using this algorithm, a perfect model (i.e., a model that can perfectly predict future observations) can be learned at least in a class of deterministic POMDP environments.

show abstract

Model-Based Online Learning of POMDPs

Cited by 72 publications

References 5 publications

Bayesian reinforcement learning for POMDP-based dialogue systems

Bayesian reinforcement learning for POMDP-based dialogue systems

A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems

Bottom-up learning of hierarchical models in a class of deterministic POMDP environments

Contact Info

Product

Resources

About