2018
DOI: 10.48550/arxiv.1806.04640
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Unsupervised Meta-Learning for Reinforcement Learning

Abstract: Meta-learning is a powerful tool that builds on multi-task learning to learn how to quickly adapt a model to new tasks. In the context of reinforcement learning, meta-learning algorithms can acquire reinforcement learning procedures to solve new problems more efficiently by meta-learning prior tasks. The performance of meta-learning algorithms critically depends on the tasks available for meta-training: in the same way that supervised learning algorithms generalize best to test points drawn from the same distr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
44
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 32 publications
(45 citation statements)
references
References 20 publications
(37 reference statements)
1
44
0
Order By: Relevance
“…However, reward binds the agent to a certain task for which the reward represents success. Aligned with the recent surge of interest in unsupervised methods in reinforcement learning (Baranes and Oudeyer, 2013;Bellemare et al, 2016;Gregor et al, 2016;Houthooft et al, 2016;Gupta et al, 2018;Hausman et al, 2018;Pong et al, 2019;Laskin et al, 2020Laskin et al, , 2021He et al, 2021) and previously proposed ideas (Schmidhuber, 1991a(Schmidhuber, , 2010, we argue that there exist properties of a dynamical system which are not tied to any particular task, yet highly useful, leveraging them can help solve other tasks more efficiently. This work focuses on the sensitivity of the produced trajectories of the system with respect to the policy so-called Physical Derivatives.…”
Section: Introductionsupporting
confidence: 54%
“…However, reward binds the agent to a certain task for which the reward represents success. Aligned with the recent surge of interest in unsupervised methods in reinforcement learning (Baranes and Oudeyer, 2013;Bellemare et al, 2016;Gregor et al, 2016;Houthooft et al, 2016;Gupta et al, 2018;Hausman et al, 2018;Pong et al, 2019;Laskin et al, 2020Laskin et al, , 2021He et al, 2021) and previously proposed ideas (Schmidhuber, 1991a(Schmidhuber, , 2010, we argue that there exist properties of a dynamical system which are not tied to any particular task, yet highly useful, leveraging them can help solve other tasks more efficiently. This work focuses on the sensitivity of the produced trajectories of the system with respect to the policy so-called Physical Derivatives.…”
Section: Introductionsupporting
confidence: 54%
“…In goal-based RL where future states can inform "optimal" reward parameters with respect to the transitions' actions, hindsight methods were applied successfully to enable effective training of goal-based Q-function for sparse rewards , derive exact connections between Q-learning and classic model-based RL , dataefficient off-policy hierarchical RL (Nachum et al, 2018), multi-task RL Li et al, 2020), offline RL (Chebotar et al, 2021), and more Choi et al, 2021;Ren et al, 2019;Zhao & Tresp, 2018;Ghosh et al, 2021;Nasiriany et al, 2021). Additionally, Lynch et al (2019) and Gupta et al (2018) have shown that often BC is sufficient for learning generalizable parameterized policies, due to rich positive examples from future states, and most recently Chen et al (2021a) and Janner et al (2021), when combined with powerful transformer architectures (Vaswani et al, 2017), it produced state-of-the-art offline RL and goal-based RL results. Lastly, while motivated from alternative mathematical principles and not for parameterized objectives, future state information was also explored as ways of reducing variance or improving estimations for generic policy gradient methods (Pinto et al, 2017;Guo et al, 2021;Venuto et al, 2021).…”
Section: Related Workmentioning
confidence: 99%
“…Meta-learning applications in NLP have yielded improvements on specific tasks (Gu et al, 2018;Han et al, 2018;Dou et al, 2019). Unsupervised meta-learning has been explored in computer vision Khodadadeh et al, 2019) and reinforcement learning (Gupta et al, 2018). cluster images using pre-trained embeddings to create tasks.…”
Section: Related Workmentioning
confidence: 99%