The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2010
DOI: 10.1007/978-3-642-13800-3_9
|View full text |Cite
|
Sign up to set email alerts
|

Consistency Modifications for Automatically Tuned Monte-Carlo Tree Search

Abstract: Abstract. Monte-Carlo Tree Search algorithms (MCTS [4,6]), including upper confidence trees (UCT [9]), are known for their impressive ability in high dimensional control problems. Whilst the main testbed is the game of Go, there are increasingly many applications [13,12,7]; these algorithms are now widely accepted as strong candidates for highdimensional control applications. Unfortunately, it is known that for optimal performance on a given problem, MCTS requires some tuning; this tuning is often handcrafted … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2011
2011
2013
2013

Publication Types

Select...
3
3

Relationship

2
4

Authors

Journals

citations
Cited by 7 publications
(9 citation statements)
references
References 11 publications
0
9
0
Order By: Relevance
“…Silver describes the technique of simulation balancing using gradient descent to bias the policy during simulations [203]. While it has been observed that improving the simulation policy does not necessarily lead to strong play [92], Silver and Tesauro demonstrate techniques for learning a simulation policy that works well with MCTS to produce balanced 19 if not strong play [203].…”
Section: Simulation Balancingmentioning
confidence: 99%
See 1 more Smart Citation
“…Silver describes the technique of simulation balancing using gradient descent to bias the policy during simulations [203]. While it has been observed that improving the simulation policy does not necessarily lead to strong play [92], Silver and Tesauro demonstrate techniques for learning a simulation policy that works well with MCTS to produce balanced 19 if not strong play [203].…”
Section: Simulation Balancingmentioning
confidence: 99%
“…In terms of board games such as Go, a pattern is a small non-empty section of the board or a logical test upon it. Patterns may also encode additional information such as the player to move, and are typically incorporated into 19. Games in which errors by one player are on average cancelled out by errors by the opponent on their next move [203].…”
Section: Patternsmentioning
confidence: 99%
“…For instance, Perrick et al [27] employ a grid search approach combined with self-playing, Chaslot et al [28] use cross entropy as a search method to tune an agent playing Go, Coulom [29] presents a generic black-box optimization method based on local quadratic regression, Maes et al [30] use estimation distribution algorithms with Gaussian distributions, Chapelle and Li [31] use Thompson sampling, and Bourki et al [32] uses, as in the present paper, a multi-armed bandit approach. The paper [33] studies the influence of the tuning of MCS algorithms on their asymptotic consistency and shows that pathological behavior may occur with tuning. It also proposes a tuning method to avoid such behavior.…”
Section: Related Workmentioning
confidence: 99%
“…Finally, it has been suggested in [8,3] to use a regularized form of the winning rate; we propose this in the "regularized" algorithm 1 (Alg. 5).…”
Section: Goalmentioning
confidence: 99%