2021
DOI: 10.48550/arxiv.2111.10364
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Generalized Decision Transformer for Offline Hindsight Information Matching

Abstract: How to extract as much learning signal from each trajectory data has been a key problem in reinforcement learning (RL), where sample inefficiency has posed serious challenges for practical applications. Recent works have shown that using expressive policy function approximators and conditioning on future trajectory information -such as future states in hindsight experience replay (HER) or returnsto-go in Decision Transformer (DT) -enables efficient learning of multi-task policies, where at times online RL is f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
23
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
2
1

Relationship

1
4

Authors

Journals

citations
Cited by 15 publications
(26 citation statements)
references
References 30 publications
0
23
0
Order By: Relevance
“…Numerous algorithmic work ensued (Wu et al, 2019;Jaques et al, 2020;Ghasemipour et al, 2021;Kumar et al, 2020;Fujimoto & Gu, 2021) with various applications (Jaques et al, 2020;Chebotar et al, 2021). Building on reward-conditioned imitation learning (Srivastava et al, 2019;Kumar et al, 2019), Transformer architecture has been recently adopted for replacing offline RL with sequence modeling (Chen et al, 2021;Janner et al, 2021;Furuta et al, 2021). Despite initial successes, many techniques popular in language modeling have yet to be experimented in these offline RL benchmarks, and our work constitutes an initial step toward bridging the two communities.…”
Section: Ablation Of Proposed Techniquesmentioning
confidence: 99%
See 3 more Smart Citations
“…Numerous algorithmic work ensued (Wu et al, 2019;Jaques et al, 2020;Ghasemipour et al, 2021;Kumar et al, 2020;Fujimoto & Gu, 2021) with various applications (Jaques et al, 2020;Chebotar et al, 2021). Building on reward-conditioned imitation learning (Srivastava et al, 2019;Kumar et al, 2019), Transformer architecture has been recently adopted for replacing offline RL with sequence modeling (Chen et al, 2021;Janner et al, 2021;Furuta et al, 2021). Despite initial successes, many techniques popular in language modeling have yet to be experimented in these offline RL benchmarks, and our work constitutes an initial step toward bridging the two communities.…”
Section: Ablation Of Proposed Techniquesmentioning
confidence: 99%
“…Concurrently, offline reinforcement learning (RL) has been seen as analogous to sequence modeling (Chen et al, 2021;Janner et al, 2021;Furuta et al, 2021), framed as simply supervised learning to fit return-augmented trajectories in an offline dataset. This relaxation, doing away with many of the complexities commonly associated with reinforcement learning (Watkins & Dayan, 1992;Kakade, 2001), allows us to take advantage of techniques popularized in sequence modeling tasks for RL.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Decision Transformer (DT) (Chen et al, 2021), and the closely related work by Janner et al (2021), provide an alternate perspective by framing offline RL as a sequence modeling problem and solving it via techniques from supervised learning. This provides a simple and scalable framework, including extensions to multi-agent RL (Meng et al, 2021), transfer learning (Boustati et al, 2021), and richer forms of conditioning (Putterman et al; Furuta et al, 2021). We proposed ODT, a simple and robust algorithm for finetuning a pretrained DT in an online setting, thus further expanding its scope to practical scenarios with a mixture of offline and online interaction data.…”
Section: Discussionmentioning
confidence: 99%