A multi-agent cooperative reinforcement learning model using a hierarchy of consultants, tutors and workers

Abed-alguni, Bilal H.; Chalup, Stephan K.; Henskens, Frans; Paul, David

doi:10.1007/s40595-015-0045-x

Cited by 43 publications

(16 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To the best of our knowledge, there is no RLbased method that attempts to infer the optimal behavior of UVs. Therefore, we builtR by using Q-learning algorithm estimating the cumulative reward of each situation-action pair in a Q-table, which is a basic method of RL [30]. Its performance largely depends on the environment design, and we utilized the same settings as that of AM and MF.…”

Section: B Experiments Settingmentioning

confidence: 99%

A Behavior Optimization Method for Unmanned Combat Aerial Vehicles Using Matrix Factorization

et al. 2020

View full text Add to dashboard Cite

One of the fundamental technologies for unmanned combat aerial vehicles and combat simulators is behavior optimization, which finds a behavior that maximizes the probability of winning a battle. With the advent of military science, combat logs became available, allowing machine learning algorithms to be used for the behavior optimization. Due to implicit attributes such as the experience of an operator that are not explicitly presented in log data, existing methods for behavior optimization have limitations in performance improvement. Furthermore, specific behaviors occur with low frequency, resulting in a dataset with imbalanced and empty values. Therefore, we apply a matrix factorization (MF) method, which is one of latent factor models and known for sophisticated imputation of empty values, to the behavior optimization problem of unmanned combat aerial vehicles. A situation-behavior matrix, whose elements are ratings indicating the optimality of behaviors in situations, is defined to implement the MF based method. Experiments for performance comparison were conducted on combat logs, in which the proposed method yielded satisfactory results. INDEX TERMS behavior optimization, unmanned vehicle, matrix factorization, reinforcement learning, situation-behavior matrix ABBREVIATIONS AM Advantage matrix. FOV Field of view. GA Genetic algorithm. LOS Line of sight. MF Matrix factorization. nDCG Normalized discounted cumulative gain. RL Reinforcement learning. SB Situation-behavior. UV Unmanned vehicle.

show abstract

Section: B Experiments Settingmentioning

confidence: 99%

A Behavior Optimization Method for Unmanned Combat Aerial Vehicles Using Matrix Factorization

et al. 2020

View full text Add to dashboard Cite

show abstract

“…An MDP comprises a set of states } ,..., , { = indicates that the transition is invalid. The immediate expected reward for executing this transition is the deterministic reward ) , ( z x a s R [3]. It is important to note that the implementation of Q-learning to stochastic MDPs is beyond the scope of this paper.…”

Section: Q-learningmentioning

confidence: 99%

“…As a consequence, AVE-Q may produce an incorrect policy, because it does not remove the bad Qvalues at the interaction stage [3].…”

Section: Related Workmentioning

confidence: 99%

“…Q-learning is a well known reinforcement learning (RL) algorithm that allows machines and software agents to develop an ideal behavior within a specific environment based on trial and error [1]- [3]. A Q-learning agent learns how to behave by trying actions to determine how to maximize some reward.…”

Section: Introductionmentioning

confidence: 99%

“…Such an approach to RL, which is called cooperative RL, is increasingly used by research labs around the world to solve real world problems, such as robot control and autonomous navigation [4], [5]. This is because cooperative reinforcement learners can learn and converge faster than independent reinforcement learners via sharing of information (e.g., Q-values, Episodes, Policies) [3], [6]- [8]. One such example is cooperative Q-learning, in which several learners share their Q-values among each other in order to accelerate their convergence to optimal solutions [9], [10].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations