UCT-ADP Progressive Bias Algorithm for Solving Gomoku

Cao, Xu; Lin, Yanghao

doi:10.1109/ssci44817.2019.9003020

Cited by 3 publications

(3 citation statements)

References 11 publications

(14 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…4. Select excellent individuals from selected population According to the ELO scoring standard commonly used in international competitions [24], the eight individuals of the 500th generation were finally selected, and the optimized chess shape parameters are obtained.…”

Section: Optimization Of Chess Shape Parametersmentioning

confidence: 99%

Research and Improvement of Alpha-Beta Search Algorithm in Gobang

Xie

Gao

Dai

et al. 2022

Advances in Transdisciplinary Engineering

View full text Add to dashboard Cite

In allusion to the Alpha-Beta search algorithm which is widely used in Gobang intelligent algorithm, this paper presents an improved method, that is, the heuristic search method using static table, which reduces the search time. Then the differential evolution algorithm is used to optimize the chess shape parameters and the static table parameters. According to the simulation data, both methods improve the efficiency of Alpha-Beta search algorithm.

show abstract

Section: Optimization Of Chess Shape Parametersmentioning

confidence: 99%

Research and Improvement of Alpha-Beta Search Algorithm in Gobang

Xie

Gao

Dai

et al. 2022

Advances in Transdisciplinary Engineering

View full text Add to dashboard Cite

show abstract

“…When Victoria first moved a stone, the algorithm yielded a winning result without fail. Cao et al presented a Gomoku AI model using an algorithm that combined the upper confidence bounds that were applied to the trees (UCT) [12] and adaptive dynamic programming (ADP) [13,14]. This algorithm could solve the search depth defect more accurately and efficiently than the case when only a single UCT was used, thereby improving the performance of Gomoku AI.…”

Section: Related Workmentioning

confidence: 99%

Enhanced Reinforcement Learning Method Combining One-Hot Encoding-Based Vectors for CNN-Based Alternative High-Level Decisions

Gu¹,

Sung

2021

Applied Sciences

View full text Add to dashboard Cite

Gomoku is a two-player board game that originated in ancient China. There are various cases of developing Gomoku using artificial intelligence, such as a genetic algorithm and a tree search algorithm. Alpha-Gomoku, Gomoku AI built with Alpha-Go’s algorithm, defines all possible situations in the Gomoku board using Monte-Carlo tree search (MCTS), and minimizes the probability of learning other correct answers in the duplicated Gomoku board situation. However, in the tree search algorithm, the accuracy drops, because the classification criteria are manually set. In this paper, we propose an improved reinforcement learning-based high-level decision approach using convolutional neural networks (CNN). The proposed algorithm expresses each state as One-Hot Encoding based vectors and determines the state of the Gomoku board by combining the similar state of One-Hot Encoding based vectors. Thus, in a case where a stone that is determined by CNN has already been placed or cannot be placed, we suggest a method for selecting an alternative. We verify the proposed method of Gomoku AI in GuPyEngine, a Python-based 3D simulation platform.

show abstract

“…If Victoria moves a stone first, it always leads to a win. Cao et al used an algorithm combining Upper Confidence Bounds applied to Trees (UCT) [18] and Adaptive Dynamic Programming (ADP) [19] and introduced a Gomoku AI model that could solve the problem of search depth defects more accurately than using a single UCT [20].…”

Section: Related Workmentioning

confidence: 99%

Enhanced DQN Framework for Selecting Actions and Updating Replay Memory Considering Massive Non-Executable Actions

Sung

2021

Applied Sciences

View full text Add to dashboard Cite

A Deep-Q-Network (DQN) controls a virtual agent as the level of a player using only screenshots as inputs. Replay memory selects a limited number of experience replays according to an arbitrary batch size and updates them using the associated Q-function. Hence, relatively fewer experience replays of different states are utilized when the number of states is fixed and the state of the randomly selected transitions becomes identical or similar. The DQN may not be applicable in some environments where it is necessary to perform the learning process using more experience replays than is required by the limited batch size. In addition, because it is unknown whether each action can be executed, a problem of an increasing amount of repetitive learning occurs as more non-executable actions are selected. In this study, an enhanced DQN framework is proposed to resolve the batch size problem and reduce the learning time of a DQN in an environment with numerous non-executable actions. In the proposed framework, non-executable actions are filtered to reduce the number of selectable actions to identify the optimal action for the current state. The proposed method was validated in Gomoku, a strategy board game, in which the application of a traditional DQN would be difficult.

show abstract

UCT-ADP Progressive Bias Algorithm for Solving Gomoku

Cited by 3 publications

References 11 publications

Research and Improvement of Alpha-Beta Search Algorithm in Gobang

Research and Improvement of Alpha-Beta Search Algorithm in Gobang

Enhanced Reinforcement Learning Method Combining One-Hot Encoding-Based Vectors for CNN-Based Alternative High-Level Decisions

Enhanced DQN Framework for Selecting Actions and Updating Replay Memory Considering Massive Non-Executable Actions

Contact Info

Product

Resources

About