2011
DOI: 10.1609/icaps.v21i1.13484
|View full text |Cite
|
Sign up to set email alerts
|

Sample-Based Planning for Continuous Action Markov Decision Processes

Abstract: In this paper, we present a new algorithm that integrates recent advances in solving continuous bandit problems with sample-based rollout methods for planning in Markov Decision Processes (MDPs). Our algorithm, Hierarchical Optimistic Optimization applied to Trees (HOOT) addresses planning in continuous-action MDPs. Empirical results are given that show that the performance of our algorithm meets or exceeds that of a similar discrete action planner by eliminating the problem of manual discretization of the act… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
4
0

Year Published

2012
2012
2024
2024

Publication Types

Select...
5
4
1

Relationship

1
9

Authors

Journals

citations
Cited by 46 publications
(5 citation statements)
references
References 7 publications
0
4
0
Order By: Relevance
“…The theoretical properties of the algorithms remain to be proven. Additionally, better ways for choosing continuous actions (Seiler, Kurniawati, and Singh 2015;Mansley, Weinstein, and Littman 2011) would provide an improvement.…”
Section: Discussionmentioning
confidence: 99%
“…The theoretical properties of the algorithms remain to be proven. Additionally, better ways for choosing continuous actions (Seiler, Kurniawati, and Singh 2015;Mansley, Weinstein, and Littman 2011) would provide an improvement.…”
Section: Discussionmentioning
confidence: 99%
“…Furthermore, since computing a good estimate of the Q-value function is costly, optimization methods that rely on gradients would be expensive to compute. Methods to alleviate this difficulty for continuous action MDP -the fully observable version of POMDP-has been proposed (Mansley, Weinstein, and Littman 2011). While for POMDP, GPS-ABT (Seiler, Kurniawati, and Singh 2015) alleviates the problem via Generalized Pattern Search.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Thus, extending texplore to use multi-dimensional continuous actions mainly requires extensions to the uct planning algorithm for sampling and selecting from a multi-dimensional continuous action space. One possible approach to this problem is to utilize recent work (Mansley et al, 2011;Weinstein and Littman, 2012) adapting the hoo algorithm for continuous bandit problems (Bubeck et al, 2011) to action selection at each level of the uct tree.…”
Section: Expanded Applicability Of Rlmentioning
confidence: 99%