Meta-Model-Based Meta-Policy Optimization

Hiraoka, Tetsuo; Imagawa, Takahisa; Tangkaratt, Voot; Osa, Takayuki; Onishi, Takashi; Tsuruoka, Yoshimasa

doi:10.48550/arxiv.2006.02608

Cited by 3 publications

(4 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…GSSM is still leading in policy performance with more than 300 average rewards advantage over MLSM-v1. We also notice L2A works not so well in our environments even after trying several hyper-parameters, similar to observations in the work (Hiraoka et al, 2020;Lee et al, 2020), and lower rewards could be due to unstable adaptations in dynamics models.…”

Section: Robotic Simulation Systemssupporting

confidence: 83%

Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models

Wang,

van Hoof

2021

Preprint

View full text Add to dashboard Cite

Reinforcement learning is a promising paradigm for solving sequential decision-making problems, but low data efficiency and weak generalization across tasks are bottlenecks in real-world applications. Model-based meta reinforcement learning addresses these issues by learning dynamics and leveraging knowledge from prior experience. In this paper, we take a closer look at this framework, and propose a new Thompson-sampling based approach that consists of a new model to identify task dynamics together with an amortized policy optimization step. We show that our model, called a graph structured surrogate model (GSSM), outperforms state-of-the-art methods in predicting environment dynamics. Additionally, our approach is able to obtain high returns, while allowing fast execution during deployment by avoiding testtime policy gradient optimization.

show abstract

Section: Robotic Simulation Systemssupporting

confidence: 83%

Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models

Wang,

van Hoof

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Ensemble Q-functions: Ensembles of Q-functions have been used in RL to consider model uncertainty (Faußer & Schwenker, 2015;Osband et al, 2016;Anschel et al, 2017;Agarwal et al, 2020;Lee et al, 2021;Lan et al, 2020;Chen et al, 2021b). Ensemble transition models: Ensembles of transition (and reward) models have been introduced to model-based RL, e.g., (Chua et al, 2018;Kurutach et al, 2018;Janner et al, 2019;Lee et al, 2020;Hiraoka et al, 2020;Abraham et al, 2020). The methods proposed in the above studies use a large ensemble of Q-functions or transition models, thus are computationally intensive.…”

Section: Related Workmentioning

confidence: 99%

“…Dropout approaches generally do not work as well as ensemble approaches (Ovadia et al, 2019;Lakshminarayanan et al, 2017;Durasov et al, 2021). For this reason, instead of dropout approaches, ensemble approaches have been used in RL with high UTD ratio settings (Chen et al, 2021b;Janner et al, 2019;Hiraoka et al, 2020;Lai et al, 2020). In Section 4, we argue that Dr.Q achieves almost the same or better bias reduction ability and sample/computationally efficiency compared with ensemble-based RL methods in high UTD ratio settings.…”

Section: Introductionmentioning

confidence: 99%

Dropout Q-Functions for Doubly Efficient Reinforcement Learning

Hiraoka¹,

Imagawa²,

Hashimoto³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Randomized ensemble double Q-learning (REDQ) (Chen et al., 2021b) has recently achieved state-of-the-art sample efficiency on continuous-action reinforcement learning benchmarks. This superior sample efficiency is possible by using a large Q-function ensemble. However, REDQ is much less computationally efficient than non-ensemble counterparts such as Soft Actor-Critic (SAC) (Haarnoja et al., 2018a). To make REDQ more computationally efficient, we propose a method of improving computational efficiency called Dr.Q, which is a variant of REDQ that uses a small ensemble of dropout Q-functions. Our dropout Q-functions are simple Q-functions equipped with dropout connection and layer normalization. Despite its simplicity of implementation, our experimental results indicate that Dr.Q is doubly (sample and computationally) efficient. It achieved comparable sample efficiency with REDQ and much better computational efficiency than REDQ and comparable computational efficiency with that of SAC.

show abstract

“…Gaussian processes have been used [33] but only for low dimension environments. Meta-RL has been used with model-free RL [34], model-based RL [12] or a mix of both [35]. For model-based RL, gradient based meta-learning was shown to be more data-efficient, resulting in a better and faster adaptation [12].…”

Section: Mpc and Meta-learningmentioning

confidence: 99%

Meta-Reinforcement Learning for Adaptive Motor Control in Changing Robot Dynamics and Environments

Anne¹,

Wilkinson²,

Li³

2021

Preprint

View full text Add to dashboard Cite

This work developed a meta-learning approach that adapts the control policy on the fly to different changing conditions for robust locomotion. The proposed method constantly updates the interaction model, samples feasible sequences of actions of estimated the state-action trajectories, and then applies the optimal actions to maximize the reward. To achieve online model adaptation, our proposed method learns different latent vectors of each training condition, which are selected online given the newly collected data. Our work designs appropriate state space and reward functions, and optimizes feasible actions in an MPC fashion which are then sampled directly in the joint space considering constraints, hence requiring no prior design of specific walking gaits. We further demonstrate the robot's capability of detecting unexpected changes during interaction and adapting control policies quickly. The extensive validation on the SpotMicro robot in a physics simulation shows adaptive and robust locomotion skills under varying ground friction, external pushes, and different robot models including hardware faults and changes.

show abstract

Meta-Model-Based Meta-Policy Optimization

Cited by 3 publications

References 16 publications

Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models

Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models

Dropout Q-Functions for Doubly Efficient Reinforcement Learning

Meta-Reinforcement Learning for Adaptive Motor Control in Changing Robot Dynamics and Environments

Contact Info

Product

Resources

About