Avoiding collaborative paradox in multi‐agent reinforcement learning

Kim, Seonghyun; Lee, Donghun; Jang, Ingook

doi:10.4218/etrij.2021-0010

Cited by 4 publications

(1 citation statement)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Liu et al proposed feudal latent space exploration for multi-agent reinforcement learning and guided a coordinated exploration using multiple agents by learning the latent structure [34]. Kim et al analyzed the problem of the collaboration paradox caused by "lazy" agents [35]. Kuba et al performed a rigorous mathematical analysis of the high variance in the estimations of the policy gradient method and derived the optimal baseline to achieve the minimum variance [36].…”

Section: A Related Workmentioning

confidence: 99%

A Sequential Decision Algorithm of Reinforcement Learning for Composite Action Space

Gao,

Wang,

Zhang

et al. 2023

IEEE Access

View full text Add to dashboard Cite

It is the key research object of electronic warfare to use UAV( Unmanned Aerial Vehicle) clusters to carry out electronic countermeasure tasks. The UAV carries loads such as reconnaissance and interference at the same time, which makes it necessary to simultaneously decide multiple types of actions-namely, compound actions-which poses a challenge to intelligent decision-making algorithms.Considering the problem of action-space dimensional complexity and weak collaboration between decisions in multi-agent scenarios with composite actions, this study proposed a decision algorithm involving a multiagent reinforcement-learning sequence, which combined joint composite actions into sequential decision, reducing the difficulty of a single decision and enhancing the collaboration between various agents and their individual decisions. Because long decision sequences required better depth modeling and had high variance, a DeLighT module was added to the naïve transformer model to increase the depth and baseline techniques, which were used to reduce the variance in the value estimation. The simulated results verified the effectiveness of the proposed algorithm in the UAV cooperative combat scenario, where each agent had a composite action space and showed better performance than the existing algorithms.

show abstract

Section: A Related Workmentioning

confidence: 99%