2019
DOI: 10.48550/arxiv.1911.10635
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Abstract: Recent years have witnessed significant advances in reinforcement learning (RL), which has registered great success in solving various sequential decision-making problems in machine learning. Most of the successful RL applications, e.g., the games of Go and Poker, robotics, and autonomous driving, involve the participation of more than one single agent, which naturally fall into the realm of multi-agent RL (MARL), a domain with a relatively long history, and has recently re-emerged due to advances in single-ag… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
159
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 98 publications
(160 citation statements)
references
References 187 publications
0
159
0
1
Order By: Relevance
“…We consider a fully observable world where one agent can access the states of all cities and all agents. Although a partial observation is more common in decentralized MARL [15], a global observation is necessary to make our model comparable to baseline algorithms, and partial observability will be considered in future works. The observation of each agent consist of three parts: the cities state, the agents state, and a global mask.…”
Section: B Observationmentioning
confidence: 99%
“…We consider a fully observable world where one agent can access the states of all cities and all agents. Although a partial observation is more common in decentralized MARL [15], a global observation is necessary to make our model comparable to baseline algorithms, and partial observability will be considered in future works. The observation of each agent consist of three parts: the cities state, the agents state, and a global mask.…”
Section: B Observationmentioning
confidence: 99%
“…In MADDPG, the actor is used to select actions, while a central critic evaluates those actions by observing the joint state and actions of all agents. In this sense, MADDPG follows the centralised learning with decentralised execution paradigm [641,642,643], which assumes unrestricted communication bandwidth during training, as well as the central controller's ability to receive and process all agents' information. To relax these assumptions, Flexible Fully-decentralised Approximate Actor-critic (F2A2) algorithm [644] was proposed as a variant of multi-agent reinforcement learning based on decentralised training with decentralised execution.…”
Section: Ai Agents For Promoting Cooperationmentioning
confidence: 99%

Social physics

Jusup,
Holme,
Kanazawa
et al. 2021
Preprint
“…To prevent high variance in Q values and ensure expedited convergence of the policy network, we adopt actor-critic algorithm [18], which introduces an advantage function to replace 𝑄 value for policy gradient calculation, i.e., 𝛿 = 𝑟 + 𝛾𝑉 𝜋 𝜃 (𝑠 ; 𝜔) − 𝑉 𝜋 𝜃 (𝑠; 𝜔), where 𝑉 𝜋 𝜃 (𝑠; 𝜔) is calculated as the expected cumulative reward following the policy 𝜋 𝜃 from state 𝑠, over all possible actions; we use a value network (the critic) with 𝜔 as the set of parameters to estimate 𝑉 𝜋 𝜃 (𝑠; 𝜔). Specifically, the actor is a policy network with input 𝑠, and output 𝜋 𝜃 (𝑠); the critic has input 𝑠, but the output layer is a linear neuron without any activation function.…”
Section: Samplementioning
confidence: 99%
“…Multi-agent Reinforcement Learning. Recently MARL has achieved promising results in various application domains, e.g., traffic engineering, video games, etc [18]. We treat the DL cluster schedulers as a multi-agent system.…”
Section: Related Workmentioning
confidence: 99%