2022
DOI: 10.1007/s10462-022-10299-x
|View full text |Cite
|
Sign up to set email alerts
|

Deep multiagent reinforcement learning: challenges and directions

Abstract: This paper surveys the field of deep multiagent reinforcement learning (RL). The combination of deep neural networks with RL has gained increased traction in recent years and is slowly shifting the focus from single-agent to multiagent environments. Dealing with multiple agents is inherently more complex as (a) the future rewards depend on multiple players’ joint actions and (b) the computational complexity increases. We present the most common multiagent problem representations and their main challenges, and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
16
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 44 publications
(22 citation statements)
references
References 128 publications
0
16
0
Order By: Relevance
“…Markov games maintain the assumption that state transitions adhere to the Markov property; nonetheless, the probabilities of transitioning between states and the expected rewards are influenced by the joint actions of all participating agents. Generally, four inherent challenges exist in MARL (Wong et al, 2022): computational complexity, non-stationarity, partial observability, and credit assignment. Agents involved in MARL can learn policies or value functions in three different architectures: decentralised, centralised, or mixed.…”
Section: Td Targetmentioning
confidence: 99%
See 1 more Smart Citation
“…Markov games maintain the assumption that state transitions adhere to the Markov property; nonetheless, the probabilities of transitioning between states and the expected rewards are influenced by the joint actions of all participating agents. Generally, four inherent challenges exist in MARL (Wong et al, 2022): computational complexity, non-stationarity, partial observability, and credit assignment. Agents involved in MARL can learn policies or value functions in three different architectures: decentralised, centralised, or mixed.…”
Section: Td Targetmentioning
confidence: 99%
“…Despite the breakthrough, researchers questioned the merit of asynchrony and tested a synchronous version of A3C known as advantage actor–critic (A2C), achieving even faster and better performance (Wu et al, 2017b). Soon after, Wu et al (2017a) developed the Actor Critic using Kronecker-Factored Trust Region (ACKTR) algorithm and claimed it to be more sample-efficient and computationally inexpensive than advantage actor–critic (A2C). In another algorithm named soft actor–critic, an entropy term is added to the reward function to induce the policy for more exploration (Haarnoja et al, 2018).…”
Section: Reinforcement Learning (Rl)mentioning
confidence: 99%
“…When the implementation of a central controller is not feasible, agents seek to improve their inference capabilities through weaker forms of information exchange. CLDE variants constitute a popular and relevant principle in this respect [42], [43] [44] [45].…”
Section: B Multiagent Constrained Reinforcement Learning For Active H...mentioning
confidence: 99%
“…[30][31][32] Higher demands on the modeling accuracy, more complex high-speed movement interactions, and further studies to develop control schemes that continuously predict and adapt to the moves of additional characters are required when maneuvering with extra character types. [33][34][35][36] As traditional autonomous competitions limit the challenges to performing fixed poses and simple object identification, [37,38] maneuvers powered by fully unmanned characters could be several years away.To avoid this modeling complexity, researchers have explored various routes to utilize machine learning, such as harnessing evolutionary strategies, imitation learning, and reinforcement learning to learn movement policies, as well as utilizing supervised learning to model character maneuvers. [39][40][41] The action at a high level is not well understood, although traditional studies have demonstrated excellent performance in solo maneuvers or progressed to simple navigation scenarios.…”
mentioning
confidence: 99%
“…[30][31][32] Higher demands on the modeling accuracy, more complex high-speed movement interactions, and further studies to develop control schemes that continuously predict and adapt to the moves of additional characters are required when maneuvering with extra character types. [33][34][35][36] As traditional autonomous competitions limit the challenges to performing fixed poses and simple object identification, [37,38] maneuvers powered by fully unmanned characters could be several years away.…”
mentioning
confidence: 99%