“…Numerous algorithmic work ensued (Wu et al, 2019;Jaques et al, 2020;Ghasemipour et al, 2021;Kumar et al, 2020;Fujimoto & Gu, 2021) with various applications (Jaques et al, 2020;Chebotar et al, 2021). Building on reward-conditioned imitation learning (Srivastava et al, 2019;Kumar et al, 2019), Transformer architecture has been recently adopted for replacing offline RL with sequence modeling (Chen et al, 2021;Janner et al, 2021;Furuta et al, 2021). Despite initial successes, many techniques popular in language modeling have yet to be experimented in these offline RL benchmarks, and our work constitutes an initial step toward bridging the two communities.…”