2014
DOI: 10.1561/2200000038
|View full text |Cite
|
Sign up to set email alerts
|

From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning

Abstract: This work covers several aspects of the optimism in the face of uncertainty principle applied to large scale optimization problems under finite numerical budget. The initial motivation for the research reported here originated from the empirical success of the so-called Monte-Carlo Tree Search method popularized in Computer Go and further extended to many other games as well as optimization and planning problems. Our objective is to contribute to the development of theoretical foundations of the field by chara… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
181
0
1

Year Published

2015
2015
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 190 publications
(190 citation statements)
references
References 85 publications
3
181
0
1
Order By: Relevance
“…We consider that the constants are k 1 = 1, k 2 = 500 , and k 3 = 1. The non-convex algorithm Simultaneous Optimistic Optimization (SOO) details can be found in [14,15]. This algorithm is used in order to obtain the game solution-i.e., final heading angle, which is the control, and time of capture, which is the payoff of the game.…”
Section: Optimization Solution To the Systemmentioning
confidence: 99%
See 1 more Smart Citation
“…We consider that the constants are k 1 = 1, k 2 = 500 , and k 3 = 1. The non-convex algorithm Simultaneous Optimistic Optimization (SOO) details can be found in [14,15]. This algorithm is used in order to obtain the game solution-i.e., final heading angle, which is the control, and time of capture, which is the payoff of the game.…”
Section: Optimization Solution To the Systemmentioning
confidence: 99%
“…The non-convex optimization algorithm implementation used [14,15] in the optimization and hybrid methods stops when a fixed number of iterations have been done, regardless of whether a solution was found or not. In order to study how the iteration number affects to solution obtaining, we run the algorithm three times for optimization method (using {10 3 , 10 4 , 10 5 } iterations) and for the hybrid approach (using {10 2 , 10 3 , 10 4 } iterations).…”
Section: Simulation 1: Comparison Between Analytical Optimization Amentioning
confidence: 99%
“…Another example of successful use of MCTS is the optimization of a "black-box" function [27,8] where the goal is to get a good estimate of the maximum of the function (deterministic or stochastic) by evaluating it only a limited number of times. The idea is to design a sequence of input samples on which the function should be evaluated given the previously observed values.…”
Section: Monte Carlo Tree Search (Mcts)mentioning
confidence: 99%
“…Instead of building the full tree, UCT chooses the action a that maximizes an upper confidence bound of Q(s, a), following the principle of optimism in the face of uncertainty. Several improvements and extensions for UCT have been proposed, including handling continuous actions [13] (see [16] for a review), and continuous states [1] with a simple Gaussian distance metric; however the knowledge of the probabilistic model is not directly exploited. For continuous states, parametric function approximation is often used (e.g., linear regression), nonetheless the model needs to be carefully tailored for the domain to solve [32].…”
Section: Related Workmentioning
confidence: 99%