A Survey of Planning and Learning in Games

Duarte, Fernando; Lau, Nuno; Pereira, Artur; Reis, Luís Paulo

doi:10.3390/app10134529

Cited by 19 publications

(13 citation statements)

References 228 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The Markov property states that given the current state and action, the next state is independent of all previous states and actions. MDPs can be described formally with the following components: denotes the state space of the process; is the set of actions; is the Markovian transition model, where P ( S t +1 | S t , A t ) is the probability of making a transition to state S t +1 when taking action A t in the state S t ; represents the reward function or feedback, R t , from the environment by which the success or failure of an agent’s actions is measured (Duarte et al, 2020). Figure 3 depicts the interaction between the agent and the environment in an MDP.…”

Section: Methodsmentioning

confidence: 99%

“…It indicates the action A t to be taken while in the state S t . In the simplest case, the objective of RL is to find a policy that maximizes the discounted return G t for each state which is the total discount reward from time-step t [43]: where 0 ≤ γ < 1 is the discount rate to balance the immediate and future rewards. Given that the discounted return function is stochastic, the expected discounted return, starting from state S , taking action A , and following policy π , is given as [44]: where Q π ( S , A ) is called the “action-value function” and denotes the expectation operator.…”

Section: Methodsmentioning

confidence: 99%

“…Almost all RL problems can be formalized as a Markov decision process which is a discrete- [43]. In MDPs, the behavior of the model is defined by the reward function.…”

Section: A Q-learningmentioning

confidence: 99%

See 2 more Smart Citations

Dynamic selective auditory attention detection using RNN and reinforcement learning

Geravanchizadeh

Roushan

2021

Preprint

View full text Add to dashboard Cite

The cocktail party phenomenon describes the ability of the human brain to focus auditory attention on a particular stimulus while ignoring other acoustic events. Selective auditory attention detection (SAAD) is an important issue in the development of brain-computer interface systems and cocktail party processors. This paper proposes a new dynamic attention detection system to process the temporal evolution of the input signal. In the proposed dynamic system, after preprocessing of the input signals, the probabilistic state space of the system is formed. Then, in the learning stage, different dynamic learning methods, including recurrent neural network (RNN) and reinforcement learning (Markov decision process (MDP) and deep Q-learning) are applied to make the final decision as to the attended speech. Among different dynamic learning approaches, the evaluation results show that the deep Q-learning approach (MDP+RNN) provides the highest classification accuracy (94.2%) with the least detection delay. The proposed SAAD system is advantageous, in the sense that the detection of attention is performed dynamically for the sequential inputs. Also, the system has the potential to be used in scenarios, where the attention of the listener might be switched in time in the presence of various acoustic events.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Dynamic selective auditory attention detection using RNN and reinforcement learning

Geravanchizadeh

Roushan

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…This section presents the AlphaZero-based algorithm allowing the robot to play Russian checkers. This algorithm does not need any input data for training except the game rules, which is a special feature of AlphaZero-based programs, and no database of the games, possible tricks, or tactics existing for the game are required [25]. Therefore, we decided to base our algorithm on the ideas of AlphaZero.…”

Section: Algorithm For Playing Russian Checkersmentioning

confidence: 99%

Interactive Robot for Playing Russian Checkers

Kopets¹,

Каримов²,

Kolev³

et al. 2020

Robotics

View full text Add to dashboard Cite

Human–robot interaction in board games is a rapidly developing field of robotics. This paper presents a robot capable of playing Russian checkers designed for entertaining, training, and research purposes. Its control program is based on a novel unsupervised self-learning algorithm inspired by AlphaZero and represents the first successful attempt of using this approach in the checkers game. The main engineering challenge in mechanics is to develop a board state acquisition system non-sensitive to lighting conditions, which is achieved by rejecting computer vision and utilizing magnetic sensors instead. An original robot face is designed to endow the robot an ability to express its attributed emotional state. Testing the robot at open-air multiday exhibitions shows the robustness of the design to difficult exploitation conditions and the high interest of visitors to the robot.

show abstract

“…However, in many computer games, from classical board games such as chess, checkers, backgammon and Go to video games such as Dota 2 and StarCraft II, machine learning has made great achievements [8]. Computer programs with machine learning technology can play at the level of a human master and even at a human world champion level.…”

Section: Introductionmentioning

confidence: 99%

An Intelligent Mission Planning Model for the Air Strike Operations against Islands Based on Neural Network and Simulation

Song

Zhang

Zhao

et al. 2022

Discrete Dynamics in Nature and Society

View full text Add to dashboard Cite

Mission planning of air strike operations is hard because it has to give instructions to a large number of units during a relatively long period of time in an uncertain environment. If some instruction parameters can be calculated by an intelligent agent, better strategies can be found more quickly. In a specific combat scenario of air strike operations against islands, an intelligent model is proposed to improve the performance and flexibility of mission planning. The proposed intelligent mission planning model is based on rule-based decision and uses a fully connected recurrent neural network to calculate some of the decision parameters. The proposed intelligent mission planning model shows better results as compared to rule-based decision making with randomized parameters, and it performs as good as experts in the test set of the specific combat scenario.

show abstract

A Survey of Planning and Learning in Games

Cited by 19 publications

References 228 publications

Dynamic selective auditory attention detection using RNN and reinforcement learning

Dynamic selective auditory attention detection using RNN and reinforcement learning

Interactive Robot for Playing Russian Checkers

An Intelligent Mission Planning Model for the Air Strike Operations against Islands Based on Neural Network and Simulation

Contact Info

Product

Resources

About