Memorizing the Playout Policy

Cazenave, Tristan; Diemert, Eustache

doi:10.1007/978-3-319-75931-9_7

Cited by 2 publications

(5 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…PPAF is harder to apply in GGP, because it would require an on-line analysis of the game rules to first extract relevant move features for a given game that might be previously unknown. Another example of PPA variant that has been investigated is the PPAF strategy with memorization (PPAFM) [5]. With respect to PPAF, the PPAFM strategy does not reset the learned policy between game steps, but keeps it memorized and reuses it.…”

Section: Related Workmentioning

confidence: 99%

“…PPA and NPPA use the Gibbs measure to select moves with k = 1, keep statistics memorized between moves and do not decay them. These settings are taken from previous publications on PPA [3][4][5]. The use of an -greedy strategy and of statistics decay after moves has been investigated, but preliminary results on PPA did not show any improvement in performance.…”

Section: Setupmentioning

confidence: 99%

“…The first one is the value used by the first proposed version of PPA [3]. However, subsequent research has shown the value 0.32 to perform better for the PPAF variant of the algorithm [5], therefore it is considered here as well.…”

Section: Basic Ppamentioning

confidence: 99%

“…Two settings for the strategy have been tested. The first one corresponds to the initial setting of PPA evaluated in Subsection 6.2 and taken from previous literature [3,5]. Thus, NPPA with move selection based on Gibbs, policy update only for the winner and α = 1 (NPPA 1 ).…”

Section: N-grams Playout Policy Adaptationmentioning

confidence: 99%

“…The algorithm presented in this paper memorizes the weights at the end of each turn to re-use them in the subsequent turn. This choice is motivated by previous literature, which showed that memorizing the playout policy for the PPAF algorithm is beneficial [5].…”

mentioning

confidence: 99%

See 4 more Smart Citations

Enhancing Playout Policy Adaptation for General Game Playing

Sironi

Cazenave

Winands

2021

Communications in Computer and Information Science

Self Cite

View full text Add to dashboard Cite

Playout Policy Adaptation (PPA) is a state-of-the-art strategy that controls the playouts in Monte-Carlo Tree Search (MCTS). PPA has been successfully applied to many two-player, sequential-move games. This paper further evaluates this strategy in General Game Playing (GGP) by first reformulating it for simultaneous-move games. Next, it presents two enhancements, which have been previously successfully applied to a related MCTS playout strategy, the Move-Average Sampling Technique (MAST). These enhancements consist in (i) updating the policy for all players proportionally to their payoffs, instead of updating it only for the winner of the playout, and (ii) collecting statistics for N-grams of moves instead of single moves only. Experiments on a heterogeneous set of games show both enhancements to have a positive effect on PPA. Results also show enhanced PPA variants to be competitive with MAST for small search budgets and better for larger ones.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Setupmentioning

confidence: 99%