Sample-Based Planning for Continuous Action Markov Decision Processes

Mansley, Chris; Weinstein, Ari; Littman, Michael L.

doi:10.1609/icaps.v21i1.13484

Cited by 46 publications

(5 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The theoretical properties of the algorithms remain to be proven. Additionally, better ways for choosing continuous actions (Seiler, Kurniawati, and Singh 2015;Mansley, Weinstein, and Littman 2011) would provide an improvement.…”

Section: Discussionmentioning

confidence: 99%

Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces

Sunberg

Kochenderfer

2018

ICAPS

View full text Add to dashboard Cite

Online solvers for partially observable Markov decision processes have been applied to problems with large discrete state spaces, but continuous state, action, and observation spaces remain a challenge. This paper begins by investigating double progressive widening (DPW) as a solution to this challenge. However, we prove that this modification alone is not sufficient because the belief representations in the search tree collapse to a single particle causing the algorithm to converge to a policy that is suboptimal regardless of the computation time. This paper proposes and evaluates two new algorithms, POMCPOW and PFT-DPW, that overcome this deficiency by using weighted particle filtering. Simulation results show that these modifications allow the algorithms to be successful where previous approaches fail.

show abstract

Section: Discussionmentioning

confidence: 99%

Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces

Sunberg

Kochenderfer

2018

ICAPS

View full text Add to dashboard Cite

show abstract

“…Furthermore, since computing a good estimate of the Q-value function is costly, optimization methods that rely on gradients would be expensive to compute. Methods to alleviate this difficulty for continuous action MDP -the fully observable version of POMDP-has been proposed (Mansley, Weinstein, and Littman 2011). While for POMDP, GPS-ABT (Seiler, Kurniawati, and Singh 2015) alleviates the problem via Generalized Pattern Search.…”

Section: Background and Related Workmentioning

confidence: 99%

An On-Line Planner for POMDPs with Large Discrete Action Space: A Quantile-Based Approach

Wang

Kurniawati

Kroese

2018

ICAPS

View full text Add to dashboard Cite

Making principled decisions in the presence of uncertainty is often facilitated by Partially Observable Markov Decision Processes (POMDPs). Despite tremendous advances in POMDP solvers, finding good policies with large action spaces remains difficult. To alleviate this difficulty, this paper presents an on-line approximate solver, called Quantile-Based Action Selector (QBASE). It uses quantile-statistics to adaptively evaluate a small subset of the action space without sacrificing the quality of the generated decision strategies by much. Experiments on four different robotics tasks with up to 10,000 actions indicate that QBASE can generate substantially better strategies than a state-of-the-art method.

show abstract

“…Thus, extending texplore to use multi-dimensional continuous actions mainly requires extensions to the uct planning algorithm for sampling and selecting from a multi-dimensional continuous action space. One possible approach to this problem is to utilize recent work (Mansley et al, 2011;Weinstein and Littman, 2012) adapting the hoo algorithm for continuous bandit problems (Bubeck et al, 2011) to action selection at each level of the uct tree.…”

Section: Expanded Applicability Of Rlmentioning

confidence: 99%

TEXPLORE: Temporal Difference Reinforcement Learning for Robots and Time-Constrained Domains

Hester¹

2013

Studies in Computational Intelligence

View full text Add to dashboard Cite

were very useful. In particular, Nick Jong let me assist on an AAMAS paper on reinforcement learning in my first year as a graduate student and gave me a great start on RL. In addition, Matt Taylor has given me guidance and advice on searching for a faculty job. It's also been great to continue to see some of these students at conferences and get advice on continuing an academic career after finishing the Ph.D.There have been a few students who have been more contemporary to me, including Juhyun Lee, Doran Chakraborty, Shivaram Kalyanakrishnan, and Brad Knox. I have had many fruitful discussions with all of these students. Now, as strange as it is, I am the senior member of the lab and people come to me for guidance and advice. This new dynamic has also led to many good discussions and work. This group includes Samuel Barrett, v

show abstract

Sample-Based Planning for Continuous Action Markov Decision Processes

Cited by 46 publications

References 7 publications

Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces

Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces

An On-Line Planner for POMDPs with Large Discrete Action Space: A Quantile-Based Approach

TEXPLORE: Temporal Difference Reinforcement Learning for Robots and Time-Constrained Domains

Contact Info

Product

Resources

About