2021
DOI: 10.48550/arxiv.2112.02845
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks

Abstract: Offline reinforcement learning leverages previously-collected offline datasets to learn optimal policies with no necessity to access the real environment. Such a paradigm is also desirable for multi-agent reinforcement learning (MARL) tasks, given the increased interactions among agents and with the enviroment. Yet, in MARL, the paradigm of offline pre-training with online fine-tuning has not been studied, nor datasets or benchmarks for offline MARL research are available. In this paper, we facilitate the rese… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(11 citation statements)
references
References 19 publications
0
11
0
Order By: Relevance
“…Decision Transformer (DT) (Chen et al, 2021), and the closely related work by Janner et al (2021), provide an alternate perspective by framing offline RL as a sequence modeling problem and solving it via techniques from supervised learning. This provides a simple and scalable framework, including extensions to multi-agent RL (Meng et al, 2021), transfer learning (Boustati et al, 2021), and richer forms of conditioning (Putterman et al; Furuta et al, 2021). We proposed ODT, a simple and robust algorithm for finetuning a pretrained DT in an online setting, thus further expanding its scope to practical scenarios with a mixture of offline and online interaction data.…”
Section: Discussionmentioning
confidence: 99%
“…Decision Transformer (DT) (Chen et al, 2021), and the closely related work by Janner et al (2021), provide an alternate perspective by framing offline RL as a sequence modeling problem and solving it via techniques from supervised learning. This provides a simple and scalable framework, including extensions to multi-agent RL (Meng et al, 2021), transfer learning (Boustati et al, 2021), and richer forms of conditioning (Putterman et al; Furuta et al, 2021). We proposed ODT, a simple and robust algorithm for finetuning a pretrained DT in an online setting, thus further expanding its scope to practical scenarios with a mixture of offline and online interaction data.…”
Section: Discussionmentioning
confidence: 99%
“…Step Transformer ConDT [Konan et al, 2022] Offline learned representation conditioning return-dependent transformation SPLT [Villaflor et al, 2022] Offline none min-max search separate models for world and policy ODT [Zheng et al, 2022] Online finetune return-to-go conditioning trajectory-based entropy MADT [Meng et al, 2021] Online finetune (multi-agent) none conditioning separate models for actor and critic Table 1: A summary of Transformers for sequential decision-making.…”
Section: Different Choices Of Conditioningmentioning
confidence: 99%
“…Deep reinforcement learning has been successfully applied to addressing complex decision problems [13][14][15][16][17]. Due to the widespread existence of multi-agent tasks, MARL has attracted increasing attention, and learning appropriate control policies is important to obtain the maximum cumulative discounted return.…”
Section: Related Workmentioning
confidence: 99%
“…Let ๐œ‹ ๐‘– represent the fixed policy for agent ๐‘– trained by the graph-based coordinated policy. ๐œ‘ is the parameter to be solved by the graph generator ๐œŒ. ๐‘”(โ€ข) and ๐‘‘ (โ€ข) are the constraint function as shown in Equation (7) and Equation (17). ๐‘„ ๐œ‹ ๐‘– (โ€ข) denotes the state action function.…”
Section: A Detailed Proofsmentioning
confidence: 99%