Generalized Mean Estimation in Monte-Carlo Tree Search

Dam, Tuan; Klink, Pascal; D’Eramo, Carlo; Peters, Jan; Pajarinen, Joni

doi:10.24963/ijcai.2020/332

Cited by 3 publications

(6 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, we provide a theory of the use of α-divergence in MCTS for backup and exploration. Remarkably, we show that our theoretical framework unifies our two proposed methods Power-UCT (Dam et al, 2019) and entropy regularization (Dam et al, 2021), that can be obtained for particular choices of the value of α. In the general case where α is considered a real number greater than 0, we show that tuning α directly influences the navigation and backup phases of the tree search, providing a unique powerful mathematical formulation to effectively balance between exploration and exploitation in MCTS.…”

Section: Introductionsupporting

confidence: 58%

A Unified Perspective on Value Backup and Exploration in Monte-Carlo Tree Search

Dam¹,

D’Eramo²,

Peters³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Monte-Carlo Tree Search (MCTS) is a class of methods for solving complex decision-making problems through the synergy of Monte-Carlo planning and Reinforcement Learning (RL). The highly combinatorial nature of the problems commonly addressed by MCTS requires the use of efficient exploration strategies for navigating the planning tree and quickly convergent value backup methods. These crucial problems are particularly evident in recent advances that combine MCTS with deep neural networks for function approximation. In this work, we propose two methods for improving the convergence rate and exploration based on a newly introduced backup operator and entropy regularization. We provide strong theoretical guarantees to bound convergence rate, approximation error, and regret of our methods. Moreover, we introduce a mathematical framework based on the use of the α-divergence for backup and exploration in MCTS. We show that this theoretical formulation unifies different approaches, including our newly introduced ones, under the same mathematical framework, allowing to obtain different methods by simply changing the value of α. In practice, our unified perspective offers a flexible way to balance between exploration and exploitation by tuning the single α parameter according to the problem at hand. We validate our methods through a rigorous empirical study from basic toy problems to the complex Atari games, and including both MDP and POMDP problems.

show abstract

Section: Introductionsupporting

confidence: 58%

A Unified Perspective on Value Backup and Exploration in Monte-Carlo Tree Search

Dam¹,

D’Eramo²,

Peters³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Notably, we propose MCPP as a general MCTS-based framework for robotic path planning. MCPP can incorporate different exploration strategies [25], [26] to continuous actions, adapting, subsequently, the convergence rates for MCPP.…”

Section: Related Workmentioning

confidence: 99%

“…Power-UCT [25], an improvement over UCT, solves the problem of the underestimation of the average mean and the max-backup operators in MCTS by proposing the use of power mean as the backup operator. Power-UCT has a polynomial convergence rate for choosing the optimal action at the root node.…”

Section: Markov Decision Processmentioning

confidence: 99%

“…Power-UCT has a polynomial convergence rate for choosing the optimal action at the root node. TENTS [26] is derived as a result of Tsallis entropy regularization in MCTS. TENTS has an exponential convergence rate at the root node, which is faster than Power-UCT and UCT.…”

Section: Markov Decision Processmentioning

confidence: 99%

“…We continue by proposing different exploration strategies in MCPP for robotic path planning. In particular, we build on top of our prior work on power-mean UCT (Power-UCT) [25] and convex regularization with Tsallis Entropy Monte-Carlo Planning (TENTS) [26], integrating them in MCPP. We provide various experimental evaluations of MCPP, initially in MDP environments for completeness and thereafter in challenging POMDP tasks in 2D and 3D while planning with a 7-DOF robot arm.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Monte-Carlo Robot Path Planning

Dam¹,

Chalvatzaki²,

Peters³

et al. 2022

Preprint

View full text Add to dashboard Cite

Path planning is a crucial algorithmic approach for designing robot behaviors. Sampling-based approaches, like rapidly exploring random trees (RRTs) or probabilistic roadmaps, are prominent algorithmic solutions for path planning problems. Despite its exponential convergence rate, RRT can only find suboptimal paths. On the other hand, RRT * , a widelyused extension to RRT, guarantees probabilistic completeness for finding optimal paths but suffers in practice from slow convergence in complex environments. Furthermore, real-world robotic environments are often partially observable or with poorly described dynamics, casting the application of RRT * in complex tasks suboptimal. This paper studies a novel algorithmic formulation of the popular Monte-Carlo tree search (MCTS) algorithm for robot path planning. Notably, we study Monte-Carlo Path Planning (MCPP) by analyzing and proving, on the one part, its exponential convergence rate to the optimal path in fully observable Markov decision processes (MDPs), and on the other part, its probabilistic completeness for finding feasible paths in partially observable MDPs (POMDPs) assuming limited distance observability (proof sketch). Our algorithmic contribution allows us to employ recently proposed variants of MCTS with different exploration strategies for robot path planning. Our experimental evaluations in simulated 2D and 3D environments with a 7 degrees of freedom (DOF) manipulator, as well as in a real-world robot path planning task, demonstrate the superiority of MCPP in POMDP tasks.

show abstract