Perseus: Randomized Point-based Value Iteration for POMDPs

Spaan, Matthijs T. J.; Vlassis, Nikos

doi:10.1613/jair.1659

Cited by 398 publications

(372 citation statements)

References 36 publications

Supporting

Mentioning

354

Contrasting

Unclassified

Order By: Relevance

“…However, such algorithms can handle only problems with few states. Point based value iteration algorithms do not compute a solution for all beliefs, but either sample a set of beliefs (as in Perseus [3]) before the main algorithm or select beliefs during planning (as in HSVI [4]). 'Value iteration' refers to updating the value for a belief using a form of the Bellman equation (2).…”

Section: Pomdp Methodsmentioning

confidence: 99%

“…Similarly to Perseus [3], FBVP starts by sampling a set of beliefs B before the actual planning. The main planning algorithm (Algorithm 1) takes the belief set B as input and produces a policy graph that can be used for decision making during online operation.…”

Section: Our Algorithm: Factorized Belief Value Projectionmentioning

confidence: 99%

“…We consider POMDPs with discrete state spaces; if the state is described by N variables, the number of states is at worst exponential in N and the number of state transitions is at worst exponential in 2N . To combat computational intractability, many planning algorithms have been introduced with various approaches for improving efficiency [3][4][5]; we describe several approaches in Section 2.1. Overall, however, computational intractability remains a large problem which limits current applicability of POMDPs.…”

Section: Introductionmentioning

confidence: 99%

“…We compare the performance of our method to four existing POMDP solutions: two traditional approaches (Perseus [3] and HSVI [4]) and two methods designed for large problems (Symbolic Perseus [5] and Truncated Krylov Iteration combined with Perseus [5]). We compare the methods on four POMDP benchmark problems of scalable size, including two new benchmarks introduced here: the Uncertain RockSample problem, which is a more difficult variant of the traditional RockSample benchmark, and Spectrum Access which is adapted from a cognitive radio application and is described further below.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Efficient Planning in Large POMDPs through Policy Graph Based Factorized Approximations

Pajarinen

Peltonen

Hottinen

et al. 2010

Machine Learning and Knowledge Discovery in Databases

View full text Add to dashboard Cite

Abstract. Partially observable Markov decision processes (POMDPs) are widely used for planning under uncertainty. In many applications, the huge size of the POMDP state space makes straightforward optimization of plans (policies) computationally intractable. To solve this, we introduce an efficient POMDP planning algorithm. Many current methods store the policy partly through a set of "value vectors" which is updated at each iteration by planning one step further; the size of such vectors follows the size of the state space, making computation intractable for large POMDPs. We store the policy as a graph only, which allows tractable approximations in each policy update step: for a state space described by several variables, we approximate beliefs over future states with factorized forms, minimizing Kullback-Leibler divergence to the non-factorized distributions. Our other speedup approximations include bounding potential rewards. We demonstrate the advantage of our method in several reinforcement learning problems, compared to four previous methods.

show abstract

Section: Pomdp Methodsmentioning

confidence: 99%

Section: Our Algorithm: Factorized Belief Value Projectionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Efficient Planning in Large POMDPs through Policy Graph Based Factorized Approximations

Pajarinen

Peltonen

Hottinen

et al. 2010

Machine Learning and Knowledge Discovery in Databases

View full text Add to dashboard Cite

show abstract

“…Policy optimization algorithms can be classified in two broad categories: i) offline techniques that pre-compute a policy before the start of the execution [14,15,16] and ii) online techniques that perform all their computation at run time by searching for the best action to execute after receiving each observation [3]. Online techniques can take advantage of the history so far to focus their computation only on the current belief.…”

Section: Introductionmentioning

confidence: 99%

Controller Compilation and Compression for Resource Constrained Applications

Grzes

Poupart

Hoey

2013

Algorithmic Decision Theory

View full text Add to dashboard Cite

Abstract. Recent advances in planning techniques for partially observable Markov decision processes have focused on online search techniques and offline point-based value iteration. While these techniques allow practitioners to obtain policies for fairly large problems, they assume that a non-negligible amount of computation can be done between each decision point. In contrast, the recent proliferation of mobile and embedded devices has lead to a surge of applications that could benefit from state of the art planning techniques if they can operate under severe constraints on computational resources. To that effect, we describe two techniques to compile policies into controllers that can be executed by a mere table lookup at each decision point. The first approach compiles policies induced by a set of alpha vectors (such as those obtained by point-based techniques) into approximately equivalent controllers, while the second approach performs a simulation to compile arbitrary policies into approximately equivalent controllers. We also describe an approach to compress controllers by removing redundant and dominated nodes, often yielding smaller and yet better controllers. The compilation and compression techniques are demonstrated on benchmark problems as well as a mobile application to help Alzheimer patients to way-find.

show abstract

References

2012

Cognitive Radio Communications and Networking

View full text Add to dashboard Cite

Perseus: Randomized Point-based Value Iteration for POMDPs

Cited by 398 publications

References 36 publications

Efficient Planning in Large POMDPs through Policy Graph Based Factorized Approximations

Efficient Planning in Large POMDPs through Policy Graph Based Factorized Approximations

Controller Compilation and Compression for Resource Constrained Applications

References

Contact Info

Product

Resources

About