2005
DOI: 10.1613/jair.1659
|View full text |Cite
|
Sign up to set email alerts
|

Perseus: Randomized Point-based Value Iteration for POMDPs

Abstract: Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agents belief space. We present a randomized point-based value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
354
0
4

Year Published

2010
2010
2023
2023

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 398 publications
(372 citation statements)
references
References 36 publications
0
354
0
4
Order By: Relevance
“…However, such algorithms can handle only problems with few states. Point based value iteration algorithms do not compute a solution for all beliefs, but either sample a set of beliefs (as in Perseus [3]) before the main algorithm or select beliefs during planning (as in HSVI [4]). 'Value iteration' refers to updating the value for a belief using a form of the Bellman equation (2).…”
Section: Pomdp Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…However, such algorithms can handle only problems with few states. Point based value iteration algorithms do not compute a solution for all beliefs, but either sample a set of beliefs (as in Perseus [3]) before the main algorithm or select beliefs during planning (as in HSVI [4]). 'Value iteration' refers to updating the value for a belief using a form of the Bellman equation (2).…”
Section: Pomdp Methodsmentioning
confidence: 99%
“…Similarly to Perseus [3], FBVP starts by sampling a set of beliefs B before the actual planning. The main planning algorithm (Algorithm 1) takes the belief set B as input and produces a policy graph that can be used for decision making during online operation.…”
Section: Our Algorithm: Factorized Belief Value Projectionmentioning
confidence: 99%
See 2 more Smart Citations
“…Policy optimization algorithms can be classified in two broad categories: i) offline techniques that pre-compute a policy before the start of the execution [14,15,16] and ii) online techniques that perform all their computation at run time by searching for the best action to execute after receiving each observation [3]. Online techniques can take advantage of the history so far to focus their computation only on the current belief.…”
Section: Introductionmentioning
confidence: 99%