Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
2015 IEEE Conference on Computational Intelligence and Games (CIG) 2015
DOI: 10.1109/cig.2015.7317923
|View full text |Cite
|
Sign up to set email alerts
|

Regulation of exploration for simple regret minimization in Monte-Carlo tree search

Abstract: The application of multi-armed bandit (MAB) algorithms was a critical step in the development of Monte-Carlo tree search (MCTS). One example would be the UCT algorithm, which applies the UCB bandit algorithm. Various research has been conducted on applying other bandit algorithms to MCTS. Simple regret bandit algorithms, which aim to identify the optimal arm after a number of trials, have been of great interest in various fields in recent years. However, the simple regret bandit algorithm has the tendency to s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
3
1
1

Relationship

1
4

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 13 publications
0
4
0
Order By: Relevance
“…[26], [27], [29]- [32]). Finally, we remark that simple-regret minimization has been successfully used in the context of Monte-Carlo Tree Search [33], [34] as well.…”
Section: B Related Workmentioning
confidence: 98%
“…[26], [27], [29]- [32]). Finally, we remark that simple-regret minimization has been successfully used in the context of Monte-Carlo Tree Search [33], [34] as well.…”
Section: B Related Workmentioning
confidence: 98%
“…WU-UCT (Watch the Unovserved in UCT), proposed by Liu et al [31] in 2019, is a parallel technique applied to the Monte Carlo Tree Search. Its idea is similar to tree parallelization [24].…”
Section: Wu-uctmentioning
confidence: 99%
“…Unlike the basic approach, in this formula the heuristic value depends on the number of losses. Other extensions to UCB can be found in Liu and Tsuruoka (2015), Mandai and Kaneko (2016), Tak et al (2014) and Yee et al (2016). Perick et al (2012) compare different UCB selection policies.…”
Section: Action Reductionmentioning
confidence: 99%