2021
DOI: 10.48550/arxiv.2108.01832
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Offline Decentralized Multi-Agent Reinforcement Learning

Abstract: In many real-world multi-agent cooperative tasks, due to high cost and risk, agents cannot interact with the environment and collect experiences during learning, but have to learn from offline datasets. However, the transition probabilities calculated from the dataset can be much different from the transition probabilities induced by the learned policies of other agents, creating large errors in value estimates. Moreover, the experience distributions of agents' datasets may vary wildly due to diverse behavior … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
16
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(16 citation statements)
references
References 9 publications
0
16
0
Order By: Relevance
“…In many practical scenarios, we only have access to the offline data or it is too expensive to frequently change the policy [Zhang et al, 2021a]. While there are plenty of empirical works on offline MARL [Pan et al, 2021, Jiang andLu, 2021], the theoretical understanding about offline MARL is still very limited. In this work, we take an initial step towards understanding when offline MARL is provably solvable.…”
Section: Introductionmentioning
confidence: 99%
“…In many practical scenarios, we only have access to the offline data or it is too expensive to frequently change the policy [Zhang et al, 2021a]. While there are plenty of empirical works on offline MARL [Pan et al, 2021, Jiang andLu, 2021], the theoretical understanding about offline MARL is still very limited. In this work, we take an initial step towards understanding when offline MARL is provably solvable.…”
Section: Introductionmentioning
confidence: 99%
“…Substantial works presented in offline RL aim at resolving the distribution shift between the static offline datasets and the online environment interactions Fujimoto et al, 2019;Kumar et al, 2019). Particularly, ; Jiang & Lu (2021) constrain off-policy algorithms in offline MARL. Related to our work for the improvement of sample efficiency, Nair et al (2020) derives the KKT condition of the online objective generating a advantage weight to avoid OOD problem.…”
Section: Related Workmentioning
confidence: 99%
“…In addition, it presents a critical challenge in the decentralized setting when the datasets for each agent only consist of its own action instead of the joint action [19]. Jiang and Lu [19] address the challenges based on the behavior regularization BCQ [13] algorithm while Yang et al [57] propose to estimate the target value based on the next action from the dataset. As a result, both methods largely depend on the quality of the dataset.…”
Section: Multi-agent Reinforcement Learningmentioning
confidence: 99%
“…However, many practical scenarios involve multiple agents, e.g., multi-robot control [4], autonomous driving [41,45]. Therefore, offline multi-agent reinforcement learning (MARL) [19,57] is crucial for solving real-world tasks. Observing recent success of Independent PPO [8] and Multi-Agent PPO [58], both of which are based on the PPO [49] algorithm, we find that online RL algorithms can be transferred to multi-agent scenarios through either decentralized training or a centralized value function without bells and whistles.…”
Section: Introductionmentioning
confidence: 99%