2019
DOI: 10.1609/aaai.v33i01.33014213
|View full text |Cite
|
Sign up to set email alerts
|

Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient

Abstract: Despite the recent advances of deep reinforcement learning (DRL), agents trained by DRL tend to be brittle and sensitive to the training environment, especially in the multi-agent scenarios. In the multi-agent setting, a DRL agent’s policy can easily get stuck in a poor local optima w.r.t. its training partners – the learned policy may be only locally optimal to other agents’ current policies. In this paper, we focus on the problem of training robust DRL agents with continuous actions in the multi-agent learni… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
114
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 202 publications
(135 citation statements)
references
References 13 publications
0
114
0
1
Order By: Relevance
“…In single-agent RL, agents can overfit to the environment [296]. A similar problem can occur in multiagent settings [254], agents can overfit, i.e., an agent's policy can easily get stuck in a local optima and the learned policy may be only locally optimal to other agents' current policies [183]. This has the effect of limiting the generalization of the learned policies [172].…”
Section: Lessons Learnedmentioning
confidence: 99%
See 1 more Smart Citation
“…In single-agent RL, agents can overfit to the environment [296]. A similar problem can occur in multiagent settings [254], agents can overfit, i.e., an agent's policy can easily get stuck in a local optima and the learned policy may be only locally optimal to other agents' current policies [183]. This has the effect of limiting the generalization of the learned policies [172].…”
Section: Lessons Learnedmentioning
confidence: 99%
“…To reduce this problem, a solution is to have a set of policies (an ensemble) and learn from them or best respond to the mixture of them [172,63,169]. Another solution has been to robustify algorithms -a robust policy should be able to behave well even with strategies different from its training (better generalization) [183].…”
Section: Lessons Learnedmentioning
confidence: 99%
“…. From here, it is straight forward to rewrite the optimization problem in (6) as min Ui f i (U ). The LQ game is a potential game if and only if Q i = Q j and…”
Section: Subclasses Of Games Within Quadratic Gamesmentioning
confidence: 99%
“…With the fame of AlphaGo, reinforcement learning has become more and more popular in academic communities. The methods of reinforcement learning to solve problems have springing up (Lample & Chaplot, 2017;Henderson et al, 2018;Conti et al, 2018;Li et al, 2019). Due to the embeddings of words in the NLP field are mostly discrete, there is relatively little research to combine reinforcement learning with NLP issues until the SeqGAN (Yu et al, 2017) appears.…”
Section: Reinforcement Learning (Rl)mentioning
confidence: 99%