Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks

Meng, Linghui; Wen, Muning; Yang, Yaodong; Le, Chenyang; Li, Xiyun; Zhang, Weinan; Wen, Yu‐Chuan; Zhang, Haifeng; Wang, Jun; Xu, Bo

doi:10.48550/arxiv.2112.02845

Cited by 7 publications

(11 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Decision Transformer (DT) (Chen et al, 2021), and the closely related work by Janner et al (2021), provide an alternate perspective by framing offline RL as a sequence modeling problem and solving it via techniques from supervised learning. This provides a simple and scalable framework, including extensions to multi-agent RL (Meng et al, 2021), transfer learning (Boustati et al, 2021), and richer forms of conditioning (Putterman et al; Furuta et al, 2021). We proposed ODT, a simple and robust algorithm for finetuning a pretrained DT in an online setting, thus further expanding its scope to practical scenarios with a mixture of offline and online interaction data.…”

Section: Discussionmentioning

confidence: 99%

Online Decision Transformer

Zheng¹,

Zhang²,

Grover³

2022

Preprint

View full text Add to dashboard Cite

Recent work has shown that offline reinforcement learning (RL) can be formulated as a sequence modeling problem (Chen et al., 2021;Janner et al., 2021) and solved via approaches similar to large-scale language modeling. However, any practical instantiation of RL also involves an online component, where policies pretrained on passive offline datasets are finetuned via taskspecific interactions with the environment. We propose Online Decision Transformers (ODT), an RL algorithm based on sequence modeling that blends offline pretraining with online finetuning in a unified framework. Our framework uses sequence-level entropy regularizers in conjunction with autoregressive modeling objectives for sample-efficient exploration and finetuning. Empirically, we show that ODT is competitive with the state-of-the-art in absolute performance on the D4RL benchmark but shows much more significant gains during the finetuning procedure.

show abstract

Section: Discussionmentioning

confidence: 99%

Online Decision Transformer

Zheng¹,

Zhang²,

Grover³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Step Transformer ConDT [Konan et al, 2022] Offline learned representation conditioning return-dependent transformation SPLT [Villaflor et al, 2022] Offline none min-max search separate models for world and policy ODT [Zheng et al, 2022] Online finetune return-to-go conditioning trajectory-based entropy MADT [Meng et al, 2021] Online finetune (multi-agent) none conditioning separate models for actor and critic Table 1: A summary of Transformers for sequential decision-making.…”

Section: Different Choices Of Conditioningmentioning

confidence: 99%

A Survey on Transformers in Reinforcement Learning

Li¹,

Luo²,

Lin³

et al. 2023

Preprint

View full text Add to dashboard Cite

Transformer has been considered the dominating neural architecture in NLP and CV, mostly under a supervised setting. Recently, a similar surge of using Transformers has appeared in the domain of reinforcement learning (RL), but it is faced with unique design choices and challenges brought by the nature of RL. However, the evolution of Transformers in RL has not yet been well unraveled. Hence, in this paper, we seek to systematically review motivations and progress on using Transformers in RL, provide a taxonomy on existing works, discuss each sub-field, and summarize future prospects.

show abstract

“…Deep reinforcement learning has been successfully applied to addressing complex decision problems [13][14][15][16][17]. Due to the widespread existence of multi-agent tasks, MARL has attracted increasing attention, and learning appropriate control policies is important to obtain the maximum cumulative discounted return.…”

Section: Related Workmentioning

confidence: 99%

“…Let 𝜋 𝑖 represent the fixed policy for agent 𝑖 trained by the graph-based coordinated policy. 𝜑 is the parameter to be solved by the graph generator 𝜌. 𝑔(•) and 𝑑 (•) are the constraint function as shown in Equation (7) and Equation (17). 𝑄 𝜋 𝑖 (•) denotes the state action function.…”

Section: A Detailed Proofsmentioning

confidence: 99%

GCS: Graph-based Coordination Strategy for Multi-Agent Reinforcement Learning

Ruan¹,

Du²,

Xiong³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Many real-world scenarios involve a team of agents that have to coordinate their policies to achieve a shared goal. Previous studies mainly focus on decentralized control to maximize a common reward and barely consider the coordination among control policies, which is critical in dynamic and complicated environments. In this work, we propose factorizing the joint team policy into a graph generator and graph-based coordinated policy to enable coordinated behaviours among agents. The graph generator adopts an encoder-decoder framework that outputs directed acyclic graphs (DAGs) to capture the underlying dynamic decision structure. We also apply the DAGness-constrained and DAG depth-constrained optimization in the graph generator to balance efficiency and performance. The graph-based coordinated policy exploits the generated decision structure. The graph generator and coordinated policy are trained simultaneously to maximize the discounted return. Empirical evaluations on Collaborative Gaussian Squeeze, Cooperative Navigation, and Google Research Football demonstrate the superiority of the proposed method. The code is available at https://github.com/Amanda-1997/GCS_aamas337.

show abstract

Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks

Cited by 7 publications

References 19 publications

Online Decision Transformer

Online Decision Transformer

A Survey on Transformers in Reinforcement Learning

GCS: Graph-based Coordination Strategy for Multi-Agent Reinforcement Learning

Contact Info

Product

Resources

About