2020
DOI: 10.48550/arxiv.2006.02608
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Meta-Model-Based Meta-Policy Optimization

Abstract: Model-based reinforcement learning (MBRL) has been applied to meta-learning settings and demonstrated its high sample efficiency. However, in previous MBRL for meta-learning settings, policies are optimized via rollouts that fully rely on a predictive model for an environment, and thus its performance in a real environment tends to degrade when the predictive model is inaccurate. In this paper, we prove that the performance degradation can be suppressed by using branched meta-rollouts. Based on this theoretica… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
3
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 16 publications
1
3
0
Order By: Relevance
“…GSSM is still leading in policy performance with more than 300 average rewards advantage over MLSM-v1. We also notice L2A works not so well in our environments even after trying several hyper-parameters, similar to observations in the work (Hiraoka et al, 2020;Lee et al, 2020), and lower rewards could be due to unstable adaptations in dynamics models.…”
Section: Robotic Simulation Systemssupporting
confidence: 83%
“…GSSM is still leading in policy performance with more than 300 average rewards advantage over MLSM-v1. We also notice L2A works not so well in our environments even after trying several hyper-parameters, similar to observations in the work (Hiraoka et al, 2020;Lee et al, 2020), and lower rewards could be due to unstable adaptations in dynamics models.…”
Section: Robotic Simulation Systemssupporting
confidence: 83%
“…Ensemble Q-functions: Ensembles of Q-functions have been used in RL to consider model uncertainty (Faußer & Schwenker, 2015;Osband et al, 2016;Anschel et al, 2017;Agarwal et al, 2020;Lee et al, 2021;Lan et al, 2020;Chen et al, 2021b). Ensemble transition models: Ensembles of transition (and reward) models have been introduced to model-based RL, e.g., (Chua et al, 2018;Kurutach et al, 2018;Janner et al, 2019;Lee et al, 2020;Hiraoka et al, 2020;Abraham et al, 2020). The methods proposed in the above studies use a large ensemble of Q-functions or transition models, thus are computationally intensive.…”
Section: Related Workmentioning
confidence: 99%
“…Dropout approaches generally do not work as well as ensemble approaches (Ovadia et al, 2019;Lakshminarayanan et al, 2017;Durasov et al, 2021). For this reason, instead of dropout approaches, ensemble approaches have been used in RL with high UTD ratio settings (Chen et al, 2021b;Janner et al, 2019;Hiraoka et al, 2020;Lai et al, 2020). In Section 4, we argue that Dr.Q achieves almost the same or better bias reduction ability and sample/computationally efficiency compared with ensemble-based RL methods in high UTD ratio settings.…”
Section: Introductionmentioning
confidence: 99%
“…Gaussian processes have been used [33] but only for low dimension environments. Meta-RL has been used with model-free RL [34], model-based RL [12] or a mix of both [35]. For model-based RL, gradient based meta-learning was shown to be more data-efficient, resulting in a better and faster adaptation [12].…”
Section: Mpc and Meta-learningmentioning
confidence: 99%