2017
DOI: 10.48550/arxiv.1712.07305
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Revisiting the Master-Slave Architecture in Multi-Agent Deep Reinforcement Learning

Abstract: Many tasks in artificial intelligence require the collaboration of multiple agents. We exam deep reinforcement learning for multi-agent domains. Recent research efforts often take the form of two seemingly conflicting perspectives, the decentralized perspective, where each agent is supposed to have its own controller; and the centralized perspective, where one assumes there is a larger model controlling all agents. In this regard, we revisit the idea of the master-slave architecture by incorporating both persp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 18 publications
(20 citation statements)
references
References 18 publications
0
18
0
Order By: Relevance
“…The data are used to recover the policy by imitation learning (2) and the learned policy is further used to calculate Q D (9). As same as most MARL works (Usunier et al 2016;Peng et al 2017;Kong et al 2017;, the opponent's policy can be simulated as a part of environment in learning. It is reasonable because our goal is to learn multi-agent cooperation rather than competition.…”
Section: Settingsmentioning
confidence: 99%
See 3 more Smart Citations
“…The data are used to recover the policy by imitation learning (2) and the learned policy is further used to calculate Q D (9). As same as most MARL works (Usunier et al 2016;Peng et al 2017;Kong et al 2017;, the opponent's policy can be simulated as a part of environment in learning. It is reasonable because our goal is to learn multi-agent cooperation rather than competition.…”
Section: Settingsmentioning
confidence: 99%
“…The Nash Equilibrium reduces the exploration state-action space and leads to more effective learning of cooperation. A similar method is proposed in (Lanctot et al 2017) where one agent update it's policy by sampling other agent's policies from their individual meta-strategies. Compared with (Lanctot et al 2017), the proposed method assumes all agents are performing as Nash Equilibrium jointly and it is more reasonable to multi-agent cooperation.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Deterministic Policy Gradient (DPG) (Silver et al 2014) is a special actor-critic algorithm where the actor adopts a deterministic policy µ θ : S → A and the action space A is continuous. Deep DPG (DDPG) (Lillicrap et al 2015) applies DNN µ θ (s) and Q(s, a; w) to represent the actor and the critic, respectively. DDPG is an off-policy method.…”
Section: Introductionmentioning
confidence: 99%