Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Zhang, Kaiqing; Yang, Zhuoran; Başar, Tamer

doi:10.48550/arxiv.1911.10635

Cited by 98 publications

(160 citation statements)

References 187 publications

Supporting

Mentioning

159

Contrasting

Unclassified

Order By: Relevance

“…We consider a fully observable world where one agent can access the states of all cities and all agents. Although a partial observation is more common in decentralized MARL [15], a global observation is necessary to make our model comparable to baseline algorithms, and partial observability will be considered in future works. The observation of each agent consist of three parts: the cities state, the agents state, and a global mask.…”

Section: B Observationmentioning

confidence: 99%

DAN: Decentralized Attention-based Neural Network to Solve the MinMax Multiple Traveling Salesman Problem

Cao¹,

Sun²,

Sartoretti³

2021

Preprint

View full text Add to dashboard Cite

The multiple traveling salesman problem (mTSP) is a well-known NP-hard problem with numerous real-world applications. In particular, this work addresses MinMax mTSP, where the objective is to minimize the max tour length (sum of Euclidean distances) among all agents. The mTSP is normally considered as a combinatorial optimization problem, but due to its computational complexity, search-based exact and heuristic algorithms become inefficient as the number of cities increases. Encouraged by the recent developments in deep reinforcement learning (dRL), this work considers the mTSP as a cooperative task and introduces a decentralized attention-based neural network method to solve the MinMax mTSP, named DAN. In DAN, agents learn fully decentralized policies to collaboratively construct a tour, by predicting the future decisions of other agents. Our model relies on the Transformer architecture, and is trained using multi-agent RL with parameter sharing, which provides natural scalability to the numbers of agents and cities.We experimentally demonstrate our model on small-to largescale mTSP instances, which involve 50 to 1000 cities and 5 to 20 agents, and compare against state-of-the-art baselines. For small-scale problems (fewer than 100 cities), DAN is able to closely match the performance of the best solver available (OR Tools, a meta-heuristic solver) given the same computation time budget. In larger-scale instances, DAN outperforms both conventional and dRL-based solvers, while keeping computation times low, and exhibits enhanced collaboration among agents.

show abstract

Section: B Observationmentioning

confidence: 99%

DAN: Decentralized Attention-based Neural Network to Solve the MinMax Multiple Traveling Salesman Problem

Cao¹,

Sun²,

Sartoretti³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…In MADDPG, the actor is used to select actions, while a central critic evaluates those actions by observing the joint state and actions of all agents. In this sense, MADDPG follows the centralised learning with decentralised execution paradigm [641,642,643], which assumes unrestricted communication bandwidth during training, as well as the central controller's ability to receive and process all agents' information. To relax these assumptions, Flexible Fully-decentralised Approximate Actor-critic (F2A2) algorithm [644] was proposed as a variant of multi-agent reinforcement learning based on decentralised training with decentralised execution.…”

Section: Ai Agents For Promoting Cooperationmentioning

confidence: 99%

Social physics

Jusup,

Holme,

Kanazawa

et al. 2021

Preprint

View full text Add to dashboard Cite

Recent decades have seen a rise in the use of physics-inspired or physicslike methods in attempts to resolve diverse societal problems. Such a rise is driven both by physicists venturing outside of their traditional domain of interest, but also by scientists from other domains who wish to mimic the enormous success of physics throughout the 19 th and 20 th century. Here, we dub the physics-inspired and physics-like work on societal problems "social physics", and pay our respect to intellectual mavericks who nurtured the field to its maturity. We do so by comprehensively (but not exhaustively) reviewing the current state of the art. Starting with a set of topics that pertain to the modern way of living and factors that enable humankind's prosperous existence, we discuss urban development and traffic, the functioning of financial markets, cooperation as a basis for civilised life, the structure of (social) networks, and the integration of intelligent machines in such networks. We then shift focus to a set of topics that explore potential threats to humanity. These include criminal behaviour, massive migrations, contagions, environmental problems, and finally climate change. The coverage of each topic is ended with ideas for future progress. Based on the number of ideas laid out, but also on the fact that the field is already too big for an exhaustive review despite our best efforts, we are forced to conclude that the future for social physics is bright. Physicists tackling societal problems are no longer a curiosity, but rather a force to be reckoned with, yet for reckoning to be truly productive, it is necessary to build dialog and mutual understanding with social scientists, environmental scientists, philosophers, and more.

show abstract

“…To prevent high variance in Q values and ensure expedited convergence of the policy network, we adopt actor-critic algorithm [18], which introduces an advantage function to replace 𝑄 value for policy gradient calculation, i.e., 𝛿 = 𝑟 + 𝛾𝑉 𝜋 𝜃 (𝑠 ; 𝜔) − 𝑉 𝜋 𝜃 (𝑠; 𝜔), where 𝑉 𝜋 𝜃 (𝑠; 𝜔) is calculated as the expected cumulative reward following the policy 𝜋 𝜃 from state 𝑠, over all possible actions; we use a value network (the critic) with 𝜔 as the set of parameters to estimate 𝑉 𝜋 𝜃 (𝑠; 𝜔). Specifically, the actor is a policy network with input 𝑠, and output 𝜋 𝜃 (𝑠); the critic has input 𝑠, but the output layer is a linear neuron without any activation function.…”

Section: Samplementioning

confidence: 99%

“…Multi-agent Reinforcement Learning. Recently MARL has achieved promising results in various application domains, e.g., traffic engineering, video games, etc [18]. We treat the DL cluster schedulers as a multi-agent system.…”

Section: Related Workmentioning

confidence: 99%

Large-scale Machine Learning Cluster Scheduling via Multi-agent Graph Reinforcement Learning

Zhao¹,

Wu²

2021

Preprint

View full text Add to dashboard Cite

Efficient scheduling of distributed deep learning (DL) jobs in large GPU clusters is crucial for resource efficiency and job performance. While server sharing among jobs improves resource utilization, interference among co-located DL jobs occurs due to resource contention. Interference-aware job placement has been studied, with white-box approaches based on explicit interference modeling and black-box schedulers with reinforcement learning. In today's clusters containing thousands of GPU servers, running a single scheduler to manage all arrival jobs in a timely and effective manner is challenging, due to the large workload scale. We adopt multiple schedulers in a largescale cluster/data center, and propose a multi-agent reinforcement learning (MARL) scheduling framework to cooperatively learn fine-grained job placement policies, towards the objective of minimizing job completion time (JCT). To achieve topologyaware placements, our proposed framework uses hierarchical graph neural networks to encode the data center topology and server architecture. In view of a common lack of precise reward samples corresponding to different placements, a job interference model is further devised to predict interference levels in face of various co-locations, for training of the MARL schedulers. Testbed and trace-driven evaluations show that our scheduler framework outperforms representative scheduling schemes by more than 20% in terms of average JCT, and is adaptive to various machine learning cluster topologies.

show abstract

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Cited by 98 publications

References 187 publications

DAN: Decentralized Attention-based Neural Network to Solve the MinMax Multiple Traveling Salesman Problem

DAN: Decentralized Attention-based Neural Network to Solve the MinMax Multiple Traveling Salesman Problem

Social physics

Large-scale Machine Learning Cluster Scheduling via Multi-agent Graph Reinforcement Learning

Contact Info

Product

Resources

About