2012
DOI: 10.1007/978-3-642-31866-5_6
|View full text |Cite
|
Sign up to set email alerts
|

Monte-Carlo Tree Search Enhancements for Havannah

Abstract: This article shows how the performance of a Monte-Carlo Tree Search (MCTS) player for Havannah can be improved, by guiding the search in the play-out and selection steps of MCTS. To improve the play-out step of the MCTS algorithm, we used two techniques to direct the simulations, Last-Good-Reply (LGR) and N-grams. Experiments reveal that LGR gives a significant improvement, although it depends on which LGR variant is used. Using N-grams to guide the play-outs also achieves a significant increase in the winning… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
10
0

Year Published

2012
2012
2016
2016

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(12 citation statements)
references
References 19 publications
2
10
0
Order By: Relevance
“…In this study, we only use LGRF-1 in our experiments because it works better for our player. This result is similar to the results found in Stankiewicz, Winands, and Uiterwijk (2012). Figure 5: The LGRF-1 improvement (simulation step biasing).…”
Section: Last-good-replysupporting
confidence: 87%
“…In this study, we only use LGRF-1 in our experiments because it works better for our player. This result is similar to the results found in Stankiewicz, Winands, and Uiterwijk (2012). Figure 5: The LGRF-1 improvement (simulation step biasing).…”
Section: Last-good-replysupporting
confidence: 87%
“…Stankiewicz et al [4] apply N -grams of length 2 and 3 with an ε-greedy simulation policy to the game of Havannah, achieving a significant increase in playing strength. Tak et al [5] suggest an enhancement similar to NAST which uses a combination of 1-, 2-and 3-grams, and demonstrate its effectiveness in the domain of General Game Playing.…”
Section: B N -Gram-average Sampling Technique (Nast)mentioning
confidence: 99%
“…N -gram-Average Sampling Technique (NAST) generalises this to sequences of N moves, learning the value of the N th move in the context of the N −1 moves that preceded it. These ideas have been studied before by other authors [4], [5]; our contribution is to investigate the mechanism by which the value estimates are used to influence the simulation policy. We show that treating the simulation policy as a multi-armed bandit problem, and using UCB1 [6] as a simulation policy, yields consistently strong results.…”
Section: Introductionmentioning
confidence: 99%
“…The idea is to look at sequences of N moves instead of one move only. This improvement can be costly according to N but it is already efficient with N = 2 (NAST2) for the game of Havannah [27].…”
Section: Playout Improvementsmentioning
confidence: 99%
“…This algorithm is called LGRF1. Other algorithms have been proposed using the same idea but LGRF1 is the most efficient one with connection games [27].…”
Section: Playout Improvementsmentioning
confidence: 99%