2022
DOI: 10.48550/arxiv.2202.05607
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Online Decision Transformer

Abstract: Recent work has shown that offline reinforcement learning (RL) can be formulated as a sequence modeling problem (Chen et al., 2021;Janner et al., 2021) and solved via approaches similar to large-scale language modeling. However, any practical instantiation of RL also involves an online component, where policies pretrained on passive offline datasets are finetuned via taskspecific interactions with the environment. We propose Online Decision Transformers (ODT), an RL algorithm based on sequence modeling that bl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
18
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 12 publications
(18 citation statements)
references
References 20 publications
0
18
0
Order By: Relevance
“…The most closely related architectures to that of Gato are Decision Transformers (Chen et al, 2021b;Furuta et al, 2021;Reid et al, 2022;Zheng et al, 2022) and Trajectory Transformer (Janner et al, 2021), which showed the usefulness of highly generic LM-like architectures for a variety of control problems. Gato also uses an LM-like architecture for control, but with design differences chosen to support multi-modality, multi-embodiment, large scale and general purpose deployment.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The most closely related architectures to that of Gato are Decision Transformers (Chen et al, 2021b;Furuta et al, 2021;Reid et al, 2022;Zheng et al, 2022) and Trajectory Transformer (Janner et al, 2021), which showed the usefulness of highly generic LM-like architectures for a variety of control problems. Gato also uses an LM-like architecture for control, but with design differences chosen to support multi-modality, multi-embodiment, large scale and general purpose deployment.…”
Section: Related Workmentioning
confidence: 99%
“…This is also true of state-of-the-art RL methods applied to board games (Schrittwieser et al, 2020). Moreover, this choice has been adopted by off-line RL benchmarks (Fu et al, 2020; and recent works on large sequence neural networks for control, including decision transformers (Chen et al, 2021b;Reid et al, 2022;Zheng et al, 2022) and the Trajectory Transformer of Janner et al (2021). In contrast, in this work we learn a single network with the same weights across a diverse set of tasks.…”
Section: Related Workmentioning
confidence: 99%
“…Furthermore, we provide comparisons against existing behavioral cloning, online and offline RL methods, and contrastive representations [77,54]. Other works that also consider LLM-like sequence modeling for a variety of single control tasks include [63,81,34,22,55].…”
Section: Related Workmentioning
confidence: 99%
“…The chain of thought imitation learning problem we formulate is domain agnostic and applicable to many sequential decision making task traditionally solved by imitation learning in a Markovian setting such as robot locomotion, navigation, manipulation, and strategy games. Unlike language-based tasks, problems in decision making have only recently started being explored by language models [68,69,70,71,72,73], as the Markovian nature of these problems brings the value of sequence modeling into question. Our work contributes to bridging the gap between learning memoryless policies in Markovian environments and the intuition that large sequence models should help in reasoning-based decision making.…”
Section: Related Workmentioning
confidence: 99%