Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient

Li, Shihui; Wu, Yi; Cui, Xinyue; Dong, Honghua; Fang, Fei; Russell, Stuart

doi:10.1609/aaai.v33i01.33014213

Cited by 202 publications

(135 citation statements)

References 13 publications

Supporting

Mentioning

114

Contrasting

Unclassified

Order By: Relevance

“…In single-agent RL, agents can overfit to the environment [296]. A similar problem can occur in multiagent settings [254], agents can overfit, i.e., an agent's policy can easily get stuck in a local optima and the learned policy may be only locally optimal to other agents' current policies [183]. This has the effect of limiting the generalization of the learned policies [172].…”

Section: Lessons Learnedmentioning

confidence: 99%

See 1 more Smart Citation

A survey and critique of multiagent deep reinforcement learning

Hernández-Leal¹,

Kartal²,

Taylor³

2019

Auton Agent Multi-Agent Syst

419

240

View full text Add to dashboard Cite

Deep reinforcement learning (RL) has achieved outstanding results in recent years. This has led to a dramatic increase in the number of applications and methods. Recent works have explored learning beyond single-agent scenarios and have considered multiagent learning (MAL) scenarios. Initial results report successes in complex multiagent domains, although there are several challenges to be addressed. The primary goal of this article is to provide a clear overview of current multiagent deep reinforcement learning (MDRL) literature. Additionally, we complement the overview with a broader analysis: (i) we revisit previous key components, originally presented in MAL and RL, and highlight how they have been adapted to multiagent deep reinforcement learning settings. (ii) We provide general guidelines to new practitioners in the area: describing lessons learned from MDRL works, pointing to recent benchmarks, and outlining open avenues of research. (iii) We take a more critical tone raising practical challenges of MDRL (e.g., implementation and computational demands). We expect this article will help unify and motivate future research to take advantage of the abundant literature that exists (e.g., RL and MAL) in a joint effort to promote fruitful research in the multiagent community.$ Earlier versions of this work had the title: "Is multiagent deep reinforcement learning the answer or the question? A brief survey" arXiv:1810.05587v3 [cs.MA] 30 Aug 2019 Go [14,15], poker [16,17], and games of two competing teams, e.g., DOTA 2 [18] and StarCraft II [19].While different techniques and algorithms were used in the above scenarios, in general, they are all a combination of techniques from two main areas: reinforcement learning (RL) [20] and deep learning [21,22].RL is an area of machine learning where an agent learns by interacting (i.e., taking actions) within a dynamic environment. However, one of the main challenges to RL, and traditional machine learning in general, is the need for manually designing quality features on which to learn. Deep learning enables efficient representation learning, thus allowing the automatic discovery of features [21,22]. In recent years, deep learning has had successes in different areas such as computer vision and natural language processing [21,22]. One of the key aspects of deep learning is the use of neural networks (NNs) that can find compact representations in high-dimensional data [23].In deep reinforcement learning (DRL) [23,24] deep neural networks are trained to approximate the optimal policy and/or the value function. In this way the deep NN, serving as function approximator, enables powerful generalization. One of the key advantages of DRL is that it enables RL to scale to problems with high-dimensional state and action spaces. However, most existing successful DRL applications so far have been on visual domains (e.g., Atari games), and there is still a lot of work to be done for more realistic applications [25,26] with complex dynamics, which are not necessarily vision-based.DRL h...

show abstract

Section: Lessons Learnedmentioning

confidence: 99%

“…To reduce this problem, a solution is to have a set of policies (an ensemble) and learn from them or best respond to the mixture of them [172,63,169]. Another solution has been to robustify algorithms -a robust policy should be able to behave well even with strategies different from its training (better generalization) [183].…”

Section: Lessons Learnedmentioning

confidence: 99%

A survey and critique of multiagent deep reinforcement learning

Hernández-Leal¹,

Kartal²,

Taylor³

2019

Auton Agent Multi-Agent Syst

419

240

View full text Add to dashboard Cite

show abstract

“…. From here, it is straight forward to rewrite the optimization problem in (6) as min Ui f i (U ). The LQ game is a potential game if and only if Q i = Q j and…”

Section: Subclasses Of Games Within Quadratic Gamesmentioning

confidence: 99%

Disturbance Decoupling for Gradient-Based Multi-Agent Learning With Quadratic Costs

Ratliff

Açıkmeşe

2021

IEEE Control Syst. Lett.

View full text Add to dashboard Cite

Motivated by applications of multi-agent learning in noisy environments, this paper studies the robustness of gradientbased learning dynamics with respect to disturbances. While disturbances injected along a coordinate corresponding to any individual player's actions can always affect the overall learning dynamics, a subset of players can be disturbance decoupled-i.e., such players' actions are completely unaffected by the injected disturbance. We provide necessary and sufficient conditions to guarantee this property for games with quadratic cost functions, which encompass quadratic one-shot continuous games, finitehorizon linear quadratic (LQ) dynamic games, and bilinear games. Specifically, disturbance decoupling is characterized by both algebraic and graph-theoretic conditions on the learning dynamics, the latter is obtained by constructing a game graph based on gradients of players' costs. For LQ games, we show that disturbance decoupling imposes constraints on the controllable and unobservable subspaces of players. For two player bilinear games, we show that disturbance decoupling within a player's action coordinates imposes constraints on the payoff matrices. Illustrative numerical examples are provided.

show abstract

“…With the fame of AlphaGo, reinforcement learning has become more and more popular in academic communities. The methods of reinforcement learning to solve problems have springing up (Lample & Chaplot, 2017;Henderson et al, 2018;Conti et al, 2018;Li et al, 2019). Due to the embeddings of words in the NLP field are mostly discrete, there is relatively little research to combine reinforcement learning with NLP issues until the SeqGAN (Yu et al, 2017) appears.…”

Section: Reinforcement Learning (Rl)mentioning

confidence: 99%

Discovering differential features: Adversarial learning for information credibility evaluation

Rao

Nazir

et al. 2020

Information Sciences

View full text Add to dashboard Cite

A series of deep learning approaches extract a large number of credibility features to detect fake news on the Internet. However, these extracted features still suffer from many irrelevant and noisy features that restrict severely the performance of the approaches. In this paper, we propose a novel model based on Adversarial Networks and inspirited by the Shared-Private model (ANSP), which aims at reducing common, irrelevant features from the extracted features for information credibility evaluation. Specifically, ANSP involves two tasks: one is to prevent the binary classification of true and false information for capturing common features relying on adversarial networks guided by reinforcement learning. Another extracts credibility features (henceforth, private features) from multiple types of credibility information and compares with the common features through two strategies, i.e., orthogonality constraints and KL-divergence for making the private features more differential. Experiments first on two six-label LIAR and Weibo datasets demonstrate that ANSP achieves the state-of-theart performance, boosting the accuracy by 2.1%, 3.1%, respectively and then on four-label Twit-ter16 validate the robustness of the model with 1.8% performance improvements.

show abstract

Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient

Cited by 202 publications

References 13 publications

A survey and critique of multiagent deep reinforcement learning

A survey and critique of multiagent deep reinforcement learning

Disturbance Decoupling for Gradient-Based Multi-Agent Learning With Quadratic Costs

Discovering differential features: Adversarial learning for information credibility evaluation

Contact Info

Product

Resources

About