Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation

Parisotto, Emilio; Salakhutdinov, Ruslan

doi:10.48550/arxiv.2104.01655

Cited by 6 publications

(8 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Parisotto et al (2020) address the problem of using transformers in RL and showed that adding gating layers on top of the transformers layers can stabilize training. Subsequent works addressed the increased computational load of using a transformer for an agent's policy (Irie et al, 2021;Parisotto & Salakhutdinov, 2021). Chen et al (2021); Janner et al (2021) take a different approach by modeling the RL problem as a sequence modeling problem and use a transformer to predict actions without additional networks for an actor or critic.…”

Section: Related Workmentioning

confidence: 99%

TransDreamer: Reinforcement Learning with Transformer World Models

Chen¹,

Yoon²,

Wu³

et al. 2022

Preprint

View full text Add to dashboard Cite

The Dreamer agent provides various benefits of Model-Based Reinforcement Learning (MBRL) such as sample efficiency, reusable knowledge, and safe planning. However, its world model and policy networks inherit the limitations of recurrent neural networks and thus an important question is how an MBRL framework can benefit from the recent advances of transformers and what the challenges are in doing so. In this paper, we propose a transformer-based MBRL agent, called TransDreamer. We first introduce the Transformer State-Space Model, a world model that leverages a transformer for dynamics predictions. We then share this world model with a transformer-based policy network and obtain stability in training a transformer-based RL agent. In experiments, we apply the proposed model to 2D visual RL and 3D first-person visual RL tasks both requiring long-range memory access for memory-based reasoning. We show that the proposed model outperforms Dreamer in these complex tasks.

show abstract

Section: Related Workmentioning

confidence: 99%

TransDreamer: Reinforcement Learning with Transformer World Models

Chen¹,

Yoon²,

Wu³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…In light of the above works, researchers are tempted to investigate the benefit of transformer models in improving reinforcement learning performance. The first line of work applies the transformer model to represent the component in standard RL algorithms, such as policy, models and value functions (Parisotto et al, 2020;Parisotto & Salakhutdinov, 2021). Instead of this, the second line of work (Chen et al, 2021;Janner et al, 2021b) abstracts RL as a sequence modelling problem and efficiently utilize the existing transformer framework widely used in language modelling to solve the RL problem.…”

Section: A Related Workmentioning

confidence: 99%

Switch Trajectory Transformer with Distributional Value Approximation for Multi-Task Reinforcement Learning

Lin¹,

Liu²,

Sengupta³

2022

Preprint

View full text Add to dashboard Cite

We propose SwitchTT, a multi-task extension to Trajectory Transformer but enhanced with two striking features: (i) exploiting a sparsely activated model to reduce computation cost in multitask offline model learning and (ii) adopting a distributional trajectory value estimator that improves policy performance, especially in sparse reward settings. These two enhancements make SwitchTT suitable for solving multi-task offline reinforcement learning problems, where model capacity is critical for absorbing the vast quantities of knowledge available in the multi-task dataset. More specifically, SwitchTT exploits switch transformer model architecture for multi-task policy learning, allowing us to improve model capacity without proportional computation cost. Also, SwitchTT approximates the distribution rather than the expectation of trajectory value, mitigating the effects of the Monte-Carlo Value estimator suffering from poor sample complexity, especially in the sparse-reward setting. We evaluate our method using the suite of ten sparse-reward tasks from the gym-mini-grid environment. We show an improvement of 10% over Trajectory Transformer across 10-task learning and obtain up to 90% increase in offline model training speed. Our results also demonstrate the advantage of the switch transformer model for absorbing expert knowledge and the importance of value distribution in evaluating the trajectory.Recent offline RL works like Decision Transformer (Chen et al., 2021), and Trajectory Transformer (Janner et al., 2021b), abstracting RL as sequence modelling, demonstrate the capability of turning large datasets into powerful decision-making engines. Such modelling design benefits multi-task RL problem by serving a high-capacity model

show abstract

“…Learned sparse attention mechanisms combined with feed-forward neural networks represent exciting alternatives for training RNNs. The best way to use attention strategies for partially observable reinforcement learning is still evolving (Parisotto et al, 2020b;Parisotto & Salakhutdinov, 2021;Loynd et al, 2020;Chen et al, 2021;Janner et al, 2021). Chen et al (2021) and Janner et al (2021) use transformers in the offline reinforcement learning setting.…”

Section: Related Workmentioning

confidence: 99%

From eye-blinks to state construction: Diagnostic benchmarks for online representation learning

Rafiee

Abbas²,

Ghiassian³

et al. 2022

Adaptive Behavior

View full text Add to dashboard Cite

We present three new diagnostic prediction problems inspired by classical-conditioning experiments to facilitate research in online prediction learning. Experiments in classical conditioning show that animals such as rabbits, pigeons, and dogs can make long temporal associations that enable multi-step prediction. To replicate this remarkable ability, an agent must construct an internal state representation that summarizes its interaction history. Recurrent neural networks can automatically construct state and learn temporal associations. However, the current training methods are prohibitively expensive for online prediction—continual learning on every time step—which is the focus of this paper. Our proposed problems test the learning capabilities that animals readily exhibit and highlight the limitations of the current recurrent learning methods. While the proposed problems are nontrivial, they are still amenable to extensive testing and analysis in the small-compute regime, thereby enabling researchers to study issues in isolation, ultimately accelerating progress towards scalable online representation learning methods.

show abstract

Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation

Cited by 6 publications

References 14 publications

TransDreamer: Reinforcement Learning with Transformer World Models

TransDreamer: Reinforcement Learning with Transformer World Models

Switch Trajectory Transformer with Distributional Value Approximation for Multi-Task Reinforcement Learning

From eye-blinks to state construction: Diagnostic benchmarks for online representation learning

Contact Info

Product

Resources

About