Regulation of exploration for simple regret minimization in Monte-Carlo tree search

Liu, Yun-Ching; Tsuruoka, Yoshimasa

doi:10.1109/cig.2015.7317923

Cited by 7 publications

(4 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[26], [27], [29]- [32]). Finally, we remark that simple-regret minimization has been successfully used in the context of Monte-Carlo Tree Search [33], [34] as well.…”

Section: B Related Workmentioning

confidence: 98%

On Sequential Elimination Algorithms for Best-Arm Identification in Multi-Armed Bandits

Shahrampour

Noshad

Tarokh

2017

IEEE Trans. Signal Process.

View full text Add to dashboard Cite

We consider the best-arm identification problem in multi-armed bandits, which focuses purely on exploration. A player is given a fixed budget to explore a finite set of arms, and the rewards of each arm are drawn independently from a fixed, unknown distribution. The player aims to identify the arm with the largest expected reward. We propose a general framework to unify sequential elimination algorithms, where the arms are dismissed iteratively until a unique arm is left. Our analysis reveals a novel performance measure expressed in terms of the sampling mechanism and number of eliminated arms at each round. Based on this result, we develop an algorithm that divides the budget according to a nonlinear function of remaining arms at each round. We provide theoretical guarantees for the algorithm, characterizing the suitable nonlinearity for different problem environments described by the number of competitive arms. Matching the theoretical results, our experiments show that the nonlinear algorithm outperforms the state-of-the-art. We finally study the side-observation model, where pulling an arm reveals the rewards of its related arms, and we establish improved theoretical guarantees in the pure-exploration setting.

show abstract

“…[26], [27], [29]- [32]). Finally, we remark that simple-regret minimization has been successfully used in the context of Monte-Carlo Tree Search [33], [34] as well.…”

Section: B Related Workmentioning

confidence: 98%

On Sequential Elimination Algorithms for Best-Arm Identification in Multi-Armed Bandits

Shahrampour

Noshad

Tarokh

2017

IEEE Trans. Signal Process.

View full text Add to dashboard Cite

show abstract

“…WU-UCT (Watch the Unovserved in UCT), proposed by Liu et al [31] in 2019, is a parallel technique applied to the Monte Carlo Tree Search. Its idea is similar to tree parallelization [24].…”

Section: Wu-uctmentioning

confidence: 99%

Computer Go Research Based on Variable Scale Training and PUB-PMCTS

Huang

Cen

et al. 2024

IEEE Access

View full text Add to dashboard Cite

The mainstream Go AI algorithms represented by AlphaZero and KataGo suffer from lowquality samples in the early training period and low exploration efficiency when performing traditional Monte Carlo Tree Search (MCTS). For the shortcomings mentioned above: The variable scale training is proposed, i.e., introducing a variable scale board with boundary conditions of randomly placed stones at the boundary periphery, to pre-train a small-scale network for recommending local move strategy and ownership. This network is used to improve the backbone network's moving policy and state value, enhancing the quality of game samples in the early stages of training. To improve the efficiency and convergence speed of the search, we propose the Parallel Monte Carlo Tree Search with Potential-Upper-Bound (PUB-PMCTS), i.e., executing multiple unevaluated searches sequentially and then evaluating multiple leaf nodes in parallel; also, the variance of the node's action values are used to forecast the potential upper limit of the node. In addition, we add a self-attention mechanism in the network to extract global context information of features and add maximum entropy loss to grow the exploration ability of the model. With the improvements described above, the bot TransGo is designed. Experimental results show that in a 13×13 Go environment, TransGo has more stable performance and higher game level in the early training period compared with other algorithms. After four days of training with TransGo, KataGo, and AlphaZero: TransGo improved by 102 Elo compared to KataGo and over 1000 Elo compared to AlphaZero.

show abstract

“…Unlike the basic approach, in this formula the heuristic value depends on the number of losses. Other extensions to UCB can be found in Liu and Tsuruoka (2015), Mandai and Kaneko (2016), Tak et al (2014) and Yee et al (2016). Perick et al (2012) compare different UCB selection policies.…”

Section: Action Reductionmentioning

confidence: 99%

Monte Carlo Tree Search: a review of recent modifications and applications

Świechowski¹,

et al. 2022

View full text Add to dashboard Cite

Monte Carlo Tree Search (MCTS) is a powerful approach to designing game-playing bots or solving sequential decision problems. The method relies on intelligent tree search that balances exploration and exploitation. MCTS performs random sampling in the form of simulations and stores statistics of actions to make more educated choices in each subsequent iteration. The method has become a state-of-the-art technique for combinatorial games. However, in more complex games (e.g. those with a high branching factor or real-time ones) as well as in various practical domains (e.g. transportation, scheduling or security) an efficient MCTS application often requires its problem-dependent modification or integration with other techniques. Such domain-specific modifications and hybrid approaches are the main focus of this survey. The last major MCTS survey was published in 2012. Contributions that appeared since its release are of particular interest for this review.

show abstract

Regulation of exploration for simple regret minimization in Monte-Carlo tree search

Cited by 7 publications

References 13 publications

On Sequential Elimination Algorithms for Best-Arm Identification in Multi-Armed Bandits

On Sequential Elimination Algorithms for Best-Arm Identification in Multi-Armed Bandits

Computer Go Research Based on Variable Scale Training and PUB-PMCTS

Monte Carlo Tree Search: a review of recent modifications and applications

Contact Info

Product

Resources

About