Abstract:Abstract. Monte-Carlo Tree Search algorithms (MCTS [4,6]), including upper confidence trees (UCT [9]), are known for their impressive ability in high dimensional control problems. Whilst the main testbed is the game of Go, there are increasingly many applications [13,12,7]; these algorithms are now widely accepted as strong candidates for highdimensional control applications. Unfortunately, it is known that for optimal performance on a given problem, MCTS requires some tuning; this tuning is often handcrafted … Show more
“…Silver describes the technique of simulation balancing using gradient descent to bias the policy during simulations [203]. While it has been observed that improving the simulation policy does not necessarily lead to strong play [92], Silver and Tesauro demonstrate techniques for learning a simulation policy that works well with MCTS to produce balanced 19 if not strong play [203].…”
Section: Simulation Balancingmentioning
confidence: 99%
“…In terms of board games such as Go, a pattern is a small non-empty section of the board or a logical test upon it. Patterns may also encode additional information such as the player to move, and are typically incorporated into 19. Games in which errors by one player are on average cancelled out by errors by the opponent on their next move [203].…”
Monte Carlo Tree Search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarise the results from the key game and non-game domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work.
“…Silver describes the technique of simulation balancing using gradient descent to bias the policy during simulations [203]. While it has been observed that improving the simulation policy does not necessarily lead to strong play [92], Silver and Tesauro demonstrate techniques for learning a simulation policy that works well with MCTS to produce balanced 19 if not strong play [203].…”
Section: Simulation Balancingmentioning
confidence: 99%
“…In terms of board games such as Go, a pattern is a small non-empty section of the board or a logical test upon it. Patterns may also encode additional information such as the player to move, and are typically incorporated into 19. Games in which errors by one player are on average cancelled out by errors by the opponent on their next move [203].…”
Monte Carlo Tree Search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarise the results from the key game and non-game domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work.
“…For instance, Perrick et al [27] employ a grid search approach combined with self-playing, Chaslot et al [28] use cross entropy as a search method to tune an agent playing Go, Coulom [29] presents a generic black-box optimization method based on local quadratic regression, Maes et al [30] use estimation distribution algorithms with Gaussian distributions, Chapelle and Li [31] use Thompson sampling, and Bourki et al [32] uses, as in the present paper, a multi-armed bandit approach. The paper [33] studies the influence of the tuning of MCS algorithms on their asymptotic consistency and shows that pathological behavior may occur with tuning. It also proposes a tuning method to avoid such behavior.…”
Abstract-Much current research in AI and games is being devoted to Monte Carlo search (MCS) algorithms. While the quest for a single unified MCS algorithm that would perform well on all problems is of major interest for AI, practitioners often know in advance the problem they want to solve, and spend plenty of time exploiting this knowledge to customize their MCS algorithm in a problem-driven way. We propose an MCS algorithm discovery scheme to perform this in an automatic and reproducible way. First, we introduce a grammar over MCS algorithms that enables inducing a rich space of candidate algorithms. Afterwards, we search in this space for the algorithm that performs best on average for a given distribution of training problems. We rely on multiarmed bandits to approximately solve this optimization problem. The experiments, generated on three different domains, show that our approach enables discovering algorithms that outperform several well-known MCS algorithms such as upper confidence bounds applied to trees and nested Monte Carlo search. We also show that the discovered algorithms are generally quite robust with respect to changes in the distribution over the training problems.
Abstract. We used in the past a lot of computational power and human expertise for having a very big dataset of good 9x9 Go games, in order to build an opening book. We improved a lot the algorithm used for generating these games. Unfortunately, the results were not very robust, as (i) opening books are definitely not transitive, making the non-regression testing extremely difficult and (ii) different time settings lead to opposite conclusions, because a good opening for a game with 10s per move on a single core is very different from a good opening for a game with 30s per move on a 32-cores machine (iii) some very bad moves sometimes occur. In this paper, we formalize the optimization of an opening book as a matrix game, compute the Nash equilibrium, and conclude that a naturally randomized opening book provides optimal performance (in the sense of Nash equilibria); surprisingly, from a finite set of opening books, we can choose a distribution on these opening books so that this random solution has a significantly better performance than each of the deterministic opening book.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.