Monte-Carlo Tree Search Enhancements for Havannah

Stankiewicz, Jan; Winands, Mark H. M.; Uiterwijk, Jos W. H. M.

doi:10.1007/978-3-642-31866-5_6

Cited by 9 publications

(12 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this study, we only use LGRF-1 in our experiments because it works better for our player. This result is similar to the results found in Stankiewicz, Winands, and Uiterwijk (2012). Figure 5: The LGRF-1 improvement (simulation step biasing).…”

Section: Last-good-replysupporting

confidence: 87%

On the Tactical and Strategic Behaviour of MTCS when Biasing Random Simulations

Teytaud¹,

Dehos²

2015

ICG

View full text Add to dashboard Cite

Over the last few years, many new algorithms have been proposed to solve combinatorial problems. In this field, Monte-Carlo Tree Search (MCTS) is a generic method which performs really well on several applications; for instance, it has been used with notable results in the game of Go. To find the most promising decision, MCTS builds a search tree where the new nodes are selected by sampling the search space randomly (i.e., by Monte-Carlo simulations). However, efficient Monte-Carlo policies are generally difficult to learn. Even if an improved Monte-Carlo policy performs adequately in some games, it can become useless or harmful in other games depending on how the algorithm takes into account the tactical and the strategic elements of the game. In this article, we address this problem by studying when and why a learned Monte-Carlo policy works. To this end, we use (1) two known Monte-Carlo policy improvements (PoolRave and Last-Good-Reply) and (2) two connection games (Hex and Havannah). We aim to understand how the benefit is related (a) to the number of random simulations and (b) to the various game rules (within them, tactical and strategic elements of the game). Our results indicate that improved Monte-Carlo policies, such as PoolRave or Last-Good-Reply, work better for games with a strong tactical element for small numbers of random simulations, whereas more general policies seem to be more suited for games with a strong strategic element for higher numbers of random simulations.

show abstract

Section: Last-good-replysupporting

confidence: 87%

On the Tactical and Strategic Behaviour of MTCS when Biasing Random Simulations

Teytaud¹,

Dehos²

2015

ICG

View full text Add to dashboard Cite

show abstract

“…Stankiewicz et al [4] apply N -grams of length 2 and 3 with an ε-greedy simulation policy to the game of Havannah, achieving a significant increase in playing strength. Tak et al [5] suggest an enhancement similar to NAST which uses a combination of 1-, 2-and 3-grams, and demonstrate its effectiveness in the domain of General Game Playing.…”

Section: B N -Gram-average Sampling Technique (Nast)mentioning

confidence: 99%

“…N -gram-Average Sampling Technique (NAST) generalises this to sequences of N moves, learning the value of the N th move in the context of the N −1 moves that preceded it. These ideas have been studied before by other authors [4], [5]; our contribution is to investigate the mechanism by which the value estimates are used to influence the simulation policy. We show that treating the simulation policy as a multi-armed bandit problem, and using UCB1 [6] as a simulation policy, yields consistently strong results.…”

Section: Introductionmentioning

confidence: 99%

Bandits all the way down: UCB1 as a simulation policy in Monte Carlo Tree Search

Powley

Whitehouse

Cowling

2013

2013 IEEE Conference on Computational Inteligence in Games (CIG)

View full text Add to dashboard Cite

Abstract-Monte Carlo Tree Search (MCTS) is a family of asymmetric anytime aheuristic game tree search algorithms which have advanced the state-of-the-art in several challenging domains. MCTS learns a playout policy, iteratively building a partial tree to store and further refine the learned portion of the policy. When the playout leaves the existing tree, it falls back to a default simulation policy, which for many variants of MCTS chooses actions uniformly at random. This paper investigates how a simulation policy can be learned during the search, helping the playout policy remain plausible from root to terminal state without the injection of prior knowledge. Since the simulation policy visits states that are previously unseen, its decisions cannot be as context sensitive as those in the tree policy. We consider the wellknown Move-Average Sampling Technique (MAST), which learns a value for each move which is independent of context. We also introduce a generalisation of MAST, called N -gram-AverageSampling-Technique (NAST), which uses as context a fixedlength sequence (or N -tuple) of recent moves.We compare several policies for selecting moves during simulation, including the UCB1 policy for multi-armed bandits (as used in the tree policy for the popular UCT variant of MCTS). In addition to the elegance of treating the entire playout as a series of multi-armed bandit problems, we find that UCB1 gives consistently strong performance. We present empirical results for three games of imperfect information, namely the card games Dou Di Zhu and Hearts and the board game Lord Of The Rings: The Confrontation, each of which has its own unique challenges for search-based AI.

show abstract

“…The idea is to look at sequences of N moves instead of one move only. This improvement can be costly according to N but it is already efficient with N = 2 (NAST2) for the game of Havannah [27].…”

Section: Playout Improvementsmentioning

confidence: 99%

“…This algorithm is called LGRF1. Other algorithms have been proposed using the same idea but LGRF1 is the most efficient one with connection games [27].…”

Section: Playout Improvementsmentioning

confidence: 99%

Pruning Playouts in Monte-Carlo Tree Search for the Game of Havannah

Duguépéroux

Mazyad

Teytaud

et al. 2016

Computers and Games

View full text Add to dashboard Cite

Abstract. Monte-Carlo Tree Search (MCTS) is a popular technique for playing multi-player games. In this paper, we propose a new method to bias the playout policy of MCTS. The idea is to prune the decisions which seem "bad" (according to the previous iterations of the algorithm) before computing each playout. Thus, the method evaluates the estimated "good" moves more precisely. We have tested our improvement for the game of Havannah and compared it to several classic improvements. Our method outperforms the classic version of MCTS (with the RAVE improvement) and the different playout policies of MCTS that we have experimented.

show abstract

Monte-Carlo Tree Search Enhancements for Havannah

Cited by 9 publications

References 19 publications

On the Tactical and Strategic Behaviour of MTCS when Biasing Random Simulations

On the Tactical and Strategic Behaviour of MTCS when Biasing Random Simulations

Bandits all the way down: UCB1 as a simulation policy in Monte Carlo Tree Search

Pruning Playouts in Monte-Carlo Tree Search for the Game of Havannah

Contact Info

Product

Resources

About