2021
DOI: 10.48550/arxiv.2106.03400
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning

Abstract: Learning from datasets without interaction with environments (Offline Learning) is an essential step to apply Reinforcement Learning (RL) algorithms in real-world scenarios. However, compared with the single-agent counterpart, offline multiagent RL introduces more agents with the larger state and action space, which is more challenging but attracts little attention. We demonstrate current offline RL algorithms are ineffective in multi-agent systems due to the accumulated extrapolation error. In this paper, we … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(7 citation statements)
references
References 33 publications
0
7
0
Order By: Relevance
“…We compare OMAR against state-of-the-art offline RL algorithms including CQL [27] and TD3+BC [11]. We also compare with a recent offline MARL algorithm MA-ICQ [57]. We build all methods on independent TD3 based on decentralized critics following de Witt et al [8], while we also consider centralized critics based on MATD3 following Yu et al [58] in Section 4.1.4.…”
Section: Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…We compare OMAR against state-of-the-art offline RL algorithms including CQL [27] and TD3+BC [11]. We also compare with a recent offline MARL algorithm MA-ICQ [57]. We build all methods on independent TD3 based on decentralized critics following de Witt et al [8], while we also consider centralized critics based on MATD3 following Yu et al [58] in Section 4.1.4.…”
Section: Methodsmentioning
confidence: 99%
“…Note that the problem is exacerbated more in the offline multi-agent setting due to the exponentially sized joint action space w.r.t. the number of agents [57]. In addition, it usually requires each of the agent to learn a good policy for coordination to solve the task, and the suboptimal policy by any agent could result in uncoordinated global failure.…”
Section: The Motivating Examplementioning
confidence: 99%
See 3 more Smart Citations