Generalized Decision Transformer for Offline Hindsight Information Matching

Furuta, Hiroyuki; Matsuo, Yutaka; Gu, Shixiang

doi:10.48550/arxiv.2111.10364

Cited by 15 publications

(26 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Numerous algorithmic work ensued (Wu et al, 2019;Jaques et al, 2020;Ghasemipour et al, 2021;Kumar et al, 2020;Fujimoto & Gu, 2021) with various applications (Jaques et al, 2020;Chebotar et al, 2021). Building on reward-conditioned imitation learning (Srivastava et al, 2019;Kumar et al, 2019), Transformer architecture has been recently adopted for replacing offline RL with sequence modeling (Chen et al, 2021;Janner et al, 2021;Furuta et al, 2021). Despite initial successes, many techniques popular in language modeling have yet to be experimented in these offline RL benchmarks, and our work constitutes an initial step toward bridging the two communities.…”

Section: Ablation Of Proposed Techniquesmentioning

confidence: 99%

“…Concurrently, offline reinforcement learning (RL) has been seen as analogous to sequence modeling (Chen et al, 2021;Janner et al, 2021;Furuta et al, 2021), framed as simply supervised learning to fit return-augmented trajectories in an offline dataset. This relaxation, doing away with many of the complexities commonly associated with reinforcement learning (Watkins & Dayan, 1992;Kakade, 2001), allows us to take advantage of techniques popularized in sequence modeling tasks for RL.…”

Section: Introductionmentioning

confidence: 99%

“…However, such concept is still relatively fresh in RL (Singh et al, 2020;Tirumala et al, 2020), due to the difficulty in parameterizing different scenes and tasks through a single network (Wang et al, 2018b;Jiang et al, 2019;Zeng et al, 2020) as well as the lack of large off-the-shelf datasets for pre-training (Cobbe et al, 2020;Zhu et al, 2020;Yu et al, 2020). Adopting pre-training as a default option for recent Transformer-based methods (Chen et al, 2021;Janner et al, 2021;Furuta et al, 2021) appears far away -if we only look within RL.…”

Section: Introductionmentioning

confidence: 99%

“…Common approaches include value-based or model-based objectives with regularization (Fujimoto et al, 2019;Levine et al, 2020), and more recently, direct generative modeling of these trajectories conditioned on hindsight returns (Chen et al, 2021;Janner et al, 2021;Furuta et al, 2021).…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Can Wikipedia Help Offline Reinforcement Learning?

Reid¹,

Yamada²,

Gu³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

Fine-tuning reinforcement learning (RL) models has been challenging because of a lack of large scale off-the-shelf datasets as well as high variance in transferability among different environments. Recent work has looked at tackling offline RL from the perspective of sequence modeling with improved results as result of the introduction of the Transformer architecture. However, when the model is trained from scratch, it suffers from slow convergence speeds. In this paper, we look to take advantage of this formulation of reinforcement learning as sequence modeling and investigate the transferability of pre-trained sequence models on other domains (vision, language) when finetuned on offline RL tasks (control, games). To this end, we also propose techniques to improve transfer between these domains. Results show consistent performance gains in terms of both convergence speed and reward on a variety of environments, accelerating training by 3-6x and achieving state-of-the-art performance in a variety of tasks using Wikipedia-pretrained and GPT2 language models. We hope that this work not only brings light to the potentials of leveraging generic sequence modeling techniques and pre-trained models for RL, but also inspires future work on sharing knowledge between generative modeling tasks of completely different domains. 1

show abstract

Section: Ablation Of Proposed Techniquesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Can Wikipedia Help Offline Reinforcement Learning?

Reid¹,

Yamada²,

Gu³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Decision Transformer (DT) (Chen et al, 2021), and the closely related work by Janner et al (2021), provide an alternate perspective by framing offline RL as a sequence modeling problem and solving it via techniques from supervised learning. This provides a simple and scalable framework, including extensions to multi-agent RL (Meng et al, 2021), transfer learning (Boustati et al, 2021), and richer forms of conditioning (Putterman et al; Furuta et al, 2021). We proposed ODT, a simple and robust algorithm for finetuning a pretrained DT in an online setting, thus further expanding its scope to practical scenarios with a mixture of offline and online interaction data.…”

Section: Discussionmentioning

confidence: 99%

Online Decision Transformer

Zheng¹,

Zhang²,

Grover³

2022

Preprint

View full text Add to dashboard Cite

Recent work has shown that offline reinforcement learning (RL) can be formulated as a sequence modeling problem (Chen et al., 2021;Janner et al., 2021) and solved via approaches similar to large-scale language modeling. However, any practical instantiation of RL also involves an online component, where policies pretrained on passive offline datasets are finetuned via taskspecific interactions with the environment. We propose Online Decision Transformers (ODT), an RL algorithm based on sequence modeling that blends offline pretraining with online finetuning in a unified framework. Our framework uses sequence-level entropy regularizers in conjunction with autoregressive modeling objectives for sample-efficient exploration and finetuning. Empirically, we show that ODT is competitive with the state-of-the-art in absolute performance on the D4RL benchmark but shows much more significant gains during the finetuning procedure.

show abstract

Medicine-Engineering Interdisciplinary Research Based on Bibliometric Analysis: A Case Study on Medicine-Engineering Institutional Cooperation of Shanghai Jiao Tong University

Wang

Cui

Deng

2022

J. Shanghai Jiaotong Univ. (Sci.)

View full text Add to dashboard Cite

This article aims to provide reference for medicine-engineering interdisciplinary research. Targeted at the scientific literature and patent literature published by Shanghai Jiao Tong University, this article attempts to set up co-occurrence matrix of medicine-engineering institutional information which was extracted from address fields of the papers, so as to construct the medicine-engineering intersection datasets. The dataset of scientific literature was analyzed using bibliometrics and visualization methods from multiple dimensions, and the most active factors, such as trends of output, journal and subject distribution, were identified from the indicators of category normalized citation impact (CNCI), times cited, keywords, citation topics and the degree of medicine-engineering interdisplinary. Research on hotspots and trends was discussed in detail. Analyses of the dataset of patent literature showed research themes and measured the degree for technology convergence of medicine-engineering.

show abstract

Generalized Decision Transformer for Offline Hindsight Information Matching

Cited by 15 publications

References 30 publications

Can Wikipedia Help Offline Reinforcement Learning?

Can Wikipedia Help Offline Reinforcement Learning?

Online Decision Transformer

Medicine-Engineering Interdisciplinary Research Based on Bibliometric Analysis: A Case Study on Medicine-Engineering Institutional Cooperation of Shanghai Jiao Tong University

Contact Info

Product

Resources

About