2022
DOI: 10.1007/s10489-022-04105-y
|View full text |Cite
|
Sign up to set email alerts
|

A review of cooperative multi-agent deep reinforcement learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
27
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 102 publications
(36 citation statements)
references
References 125 publications
0
27
0
Order By: Relevance
“…In Zhang et al (2021), a selective overview of theories and algorithms on multi-agent reinforcement learning is presented. In Oroojlooy and Hajinezhad (2022), a review on cooperative multi-agent deep reinforcement learning is given.…”
Section: Reinforcement Learning In Multi-agent Search Tasksmentioning
confidence: 99%
“…In Zhang et al (2021), a selective overview of theories and algorithms on multi-agent reinforcement learning is presented. In Oroojlooy and Hajinezhad (2022), a review on cooperative multi-agent deep reinforcement learning is given.…”
Section: Reinforcement Learning In Multi-agent Search Tasksmentioning
confidence: 99%
“…This focus has clearly been missing in the Deep RL field. While multi-agent reinforcement learning (MARL) is a well-established branch of Deep RL, most learning algorithms and environments proposed have targeted a relatively small number of agents (Foerster et al 2016; OroojlooyJadid and Hajinezhad, 2019), and thus not sufficient to study the emergent properties from large populations. In the most common MARL environments (Resnick et al 2018; Baker et al 2019; Jaderberg et al 2019; Terry et al 2020), “multi-agent” simply means two or four agent trained to perform a task by means of self-play (Bansal et al 2017; Liu et al 2019; Ha, 2020).…”
Section: Collective Intelligence For Deep Learningmentioning
confidence: 99%
“…The HetGAT Enc-Dec policy achieved the highest fleet rewards, followed by HetGAT and HetGCN. For a thorough discussion on CTDE MARL, we refer our readers to [37]. For the HetGAT policy, we only used a module resembling encoder architecture, however, with scalar outputs for the depot representations h d , used as the actionvalue function outputs.…”
Section: B One-shot Trainingmentioning
confidence: 99%