2018
DOI: 10.1007/978-3-319-75931-9_7
|View full text |Cite
|
Sign up to set email alerts
|

Memorizing the Playout Policy

Abstract: Monte Carlo Tree Search (MCTS) is the state of the art algorithm for General Game Playing (GGP). Playout Policy Adaptation with move Features (PPAF) is a state of the art MCTS algorithm that learns a playout policy online. We propose a simple modification to PPAF consisting in memorizing the learned policy from one move to the next. We test PPAF with memorization (PPAFM) against PPAF and UCT for Atarigo, Breakthrough, Misere Breakthrough, Domineering, Misere Domineering, Knightthrough, Misere Knightthrough and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(5 citation statements)
references
References 37 publications
0
5
0
Order By: Relevance
“…PPAF is harder to apply in GGP, because it would require an on-line analysis of the game rules to first extract relevant move features for a given game that might be previously unknown. Another example of PPA variant that has been investigated is the PPAF strategy with memorization (PPAFM) [5]. With respect to PPAF, the PPAFM strategy does not reset the learned policy between game steps, but keeps it memorized and reuses it.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…PPAF is harder to apply in GGP, because it would require an on-line analysis of the game rules to first extract relevant move features for a given game that might be previously unknown. Another example of PPA variant that has been investigated is the PPAF strategy with memorization (PPAFM) [5]. With respect to PPAF, the PPAFM strategy does not reset the learned policy between game steps, but keeps it memorized and reuses it.…”
Section: Related Workmentioning
confidence: 99%
“…PPA and NPPA use the Gibbs measure to select moves with k = 1, keep statistics memorized between moves and do not decay them. These settings are taken from previous publications on PPA [3][4][5]. The use of an -greedy strategy and of statistics decay after moves has been investigated, but preliminary results on PPA did not show any improvement in performance.…”
Section: Setupmentioning
confidence: 99%
See 3 more Smart Citations