Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence 2020
DOI: 10.24963/ijcai.2020/332
|View full text |Cite
|
Sign up to set email alerts
|

Generalized Mean Estimation in Monte-Carlo Tree Search

Abstract: We consider Monte-Carlo Tree Search (MCTS) applied to Markov Decision Processes (MDPs) and Partially Observable MDPs (POMDPs), and the well-known Upper Confidence bound for Trees (UCT) algorithm. In UCT, a tree with nodes (states) and edges (actions) is incrementally built by the expansion of nodes, and the values of nodes are updated through a backup strategy based on the average value of child nodes. However, it has been shown that with enough samples the maximum operator yields more accurate node va… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
5
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(6 citation statements)
references
References 4 publications
1
5
0
Order By: Relevance
“…Finally, we provide a theory of the use of α-divergence in MCTS for backup and exploration. Remarkably, we show that our theoretical framework unifies our two proposed methods Power-UCT (Dam et al, 2019) and entropy regularization (Dam et al, 2021), that can be obtained for particular choices of the value of α. In the general case where α is considered a real number greater than 0, we show that tuning α directly influences the navigation and backup phases of the tree search, providing a unique powerful mathematical formulation to effectively balance between exploration and exploitation in MCTS.…”
Section: Introductionsupporting
confidence: 58%
“…Finally, we provide a theory of the use of α-divergence in MCTS for backup and exploration. Remarkably, we show that our theoretical framework unifies our two proposed methods Power-UCT (Dam et al, 2019) and entropy regularization (Dam et al, 2021), that can be obtained for particular choices of the value of α. In the general case where α is considered a real number greater than 0, we show that tuning α directly influences the navigation and backup phases of the tree search, providing a unique powerful mathematical formulation to effectively balance between exploration and exploitation in MCTS.…”
Section: Introductionsupporting
confidence: 58%
“…Notably, we propose MCPP as a general MCTS-based framework for robotic path planning. MCPP can incorporate different exploration strategies [25], [26] to continuous actions, adapting, subsequently, the convergence rates for MCPP.…”
Section: Related Workmentioning
confidence: 99%
“…Power-UCT [25], an improvement over UCT, solves the problem of the underestimation of the average mean and the max-backup operators in MCTS by proposing the use of power mean as the backup operator. Power-UCT has a polynomial convergence rate for choosing the optimal action at the root node.…”
Section: Markov Decision Processmentioning
confidence: 99%
See 2 more Smart Citations