2012
DOI: 10.1007/s10994-012-5280-0
|View full text |Cite
|
Sign up to set email alerts
|

Temporal-difference search in computer Go

Abstract: Temporal-difference learning is one of the most successful and broadly applied solutions to the reinforcement learning problem; it has been used to achieve master-level play in chess, checkers and backgammon. The key idea is to update a value function from episodes of real experience, by bootstrapping from future value estimates, and using value function approximation to generalise between related states. Monte-Carlo tree search is a recent algorithm for high-performance search, which has been used to achieve … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
44
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 75 publications
(50 citation statements)
references
References 29 publications
0
44
0
Order By: Relevance
“…Thus improving the simulation time and computational load. To this end, we compare our solution against TD-search [2] and both a vanilla-UCT and random-UCT implementations. We refer to vanilla-UCT as the standard UCT algorithm that, at each iteration, expands every possible action in A j , for every agent j.…”
Section: Experimental Evaluationmentioning
confidence: 99%
See 1 more Smart Citation
“…Thus improving the simulation time and computational load. To this end, we compare our solution against TD-search [2] and both a vanilla-UCT and random-UCT implementations. We refer to vanilla-UCT as the standard UCT algorithm that, at each iteration, expands every possible action in A j , for every agent j.…”
Section: Experimental Evaluationmentioning
confidence: 99%
“…Nevertheless, they show difficulties in relating similar states (i.e. nodes of the search tree) [2]. Here, we focus on the problem of cooperative general-sum stochastic games [3], where each agent runs its own learning process.…”
Section: Introductionmentioning
confidence: 99%
“…Some examples include: One of the original proposals for Monte Carlo search was to improve the strength of TD‐Gammon, using its learned neural network evaluation function to bias move choice during playouts . Furthermore, some of the research on Monte Carlo tree search for Go originates in the reinforcement learning community that developed TD learning; it is possible to view Monte Carlo tree search as a kind of reinforcement learning that takes place in real time while analyzing a position to choose a move Monte Carlo tree search has been established as the leading method in other games including Hex and Lines of Action, and has been used to close in on human expert levels of play in Havannah .…”
Section: Monte Carlo Tree Search In Game Of Gomentioning
confidence: 99%
“…Balakrishna et al (2010) used the value function learning accurately to predict taxi-out times in airports. Similarly, Silver et al (2012) introduced a new approach in the game Go by implementing both value function approximation and bootstrapping to obtain high performance policy search. They showed that this approach outperforms to Monte Carlo Tree search.…”
Section: Introductionmentioning
confidence: 99%