Introducing Symmetries to Black Box Meta Reinforcement Learning

Kirsch, Louis; Flennerhag, Sebastian; Hasselt, Hado van; Friesen, Abram L.; Oh, Junhyuk; Chen, Yutian

doi:10.1609/aaai.v36i7.20681

Cited by 9 publications

(8 citation statements)

References 14 publications

(26 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Linear transformers are a type of FWP where information is stored through outer products of keys and values (Schlag et al, 2021;Schmidhuber, 1992). FWPs are used in the context of memory-based meta learning (Schmidhuber, 1993;Miconi et al, 2018;Gregor, 2020;Kirsch and Schmidhuber, 2021;Irie et al, 2021;Kirsch et al, 2022), predicting parameters for varying architectures (Knyazev et al, 2021), and reinforcement learning (Gomez and Schmidhuber, 2005;Najarro and Risi, 2020;Kirsch et al, 2022). In contrast to all of these approaches, ours uses FWPs to generate policies conditioning on a command (target return).…”

Section: Related Workmentioning

confidence: 99%

Goal-Conditioned Generators of Deep Policies

Faccio¹,

Herrmann²,

Ramesh³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Goal-conditioned Reinforcement Learning (RL) aims at learning optimal policies, given goals encoded in special command inputs. Here we study goal-conditioned neural nets (NNs) that learn to generate deep NN policies in form of context-specific weight matrices, similar to Fast Weight Programmers and other methods from the 1990s. Using context commands of the form "generate a policy that achieves a desired expected return," our NN generators combine powerful exploration of parameter space with generalization across commands to iteratively find better and better policies. A form of weight-sharing HyperNetworks and policy embeddings scales our method to generate deep NNs. Experiments show how a single learned policy generator can produce policies that achieve any return seen during training. Finally, we evaluate our algorithm on a set of continuous control tasks where it exhibits competitive performance. Our code is public. 2

show abstract

Section: Related Workmentioning

confidence: 99%

Goal-Conditioned Generators of Deep Policies

Faccio¹,

Herrmann²,

Ramesh³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…The reward function of an MDP defines the task we want the agent to solve. However, the task-defining rewards may be challenging to learn from because maximizing Xu et al [254] Black-box Kirsch et al [108] Table 5: Many-shot meta-RL methods categorized by the task distribution considered and metaparametrization.…”

Section: Learning Intrinsic Rewardsmentioning

confidence: 99%

“…Black-box meta-learning In few-shot meta-RL, black-box methods that use RNNs or other neural networks instead of stochastic gradient descent (SGD) tend to learn faster than the SGD-based alternatives. Kirsch et al [108] argue that many black-box meta-RL approaches, e.g., [46,239] cannot generalize well to unseen environments because they can easily overfit to the training environments. To combat overfitting, they introduce a specialized RNN architecture, which reuses the same RNN cell multiple times, making the RNN weights agnostic to the input and output dimensions and permutations.…”

Section: Auxiliary Tasksmentioning

confidence: 99%

“…Another method meta-learns weights for samples taken from the learned model in model-based methods [93]. Alternatively, the black-box optimization algorithm evolution strategies (ES) [186,245,196] is used by [92,108,135]. ES suffers less from the vanishing and exploding gradients problem and has more favorable memory requirements at the cost of high variance and sample complexity compared to SGD-based methods [144].…”

Section: Auxiliary Tasksmentioning

confidence: 99%

See 1 more Smart Citation

A Survey of Meta-Reinforcement Learning

Beck¹,

Vuorio²,

Liu³

et al. 2023

Preprint

View full text Add to dashboard Cite

While deep reinforcement learning (RL) has fueled multiple high-profile successes in machine learning, it is held back from more widespread adoption by its often poor data efficiency and the limited generality of the policies it produces. A promising approach for alleviating these limitations is to cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL. Meta-RL is most commonly studied in a problem setting where, given a distribution of tasks, the goal is to learn a policy that is capable of adapting to any new task from the task distribution with as little data as possible. In this survey, we describe the meta-RL problem setting in detail as well as its major variations. We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task. Using these clusters, we then survey meta-RL algorithms and applications. We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.

show abstract

“…There have been a number of recent improvements on this topic of meta-optimization in multi-task RL Kim, Yoon, Dia, Kim, Bengio, and Ahn, 2018;Rothfuss, Lee, Clavera, Asfour, and Abbeel, 2018;Flennerhag, Moreno, Lawrence, and Damianou, 2018;Nagabandi, Finn, and Levine, 2018;Mendonca, Gupta, Kralev, Abbeel, Levine, and Finn, 2019;Finn, Rajeswaran, Kakade, and Levine, 2019;Lin, Thomas, Yang, and Ma, 2020;Berseth, Zhang, Zhang, Finn, and Levine, 2021;Co-Reyes, Miao, Peng, Real, Levine, Le, Lee, and Faust, 2021b;Kirsch, Flennerhag, van Hasselt, Friesen, Oh, and Chen, 2022;Wan, Peng, and Gangwani, 2022;Melo, 2022;Nam, Sun, Pertsch, Hwang, and Lim, 2022) and multi-agent RL (Foerster et al, 2018a,1;Kim et al, 2021a;Al-Shedivat, Bansal, Burda, Sutskever, Mordatch, and Abbeel, 2017). Moreover, another group of recent approaches focuses on learning a meta-critic, which explicitly guides updates to the agent's policy rather than simply guiding its actions (Harb et al, 2020;Sung, Zhang, Xiang, Hospedales, and Yang, 2017;Xu, Cao, and Chen, 2019).…”

Section: Learning To Adaptmentioning

confidence: 99%

Towards Continual Reinforcement Learning: A Review and Perspectives

Khetarpal¹,

Riemer

Rish³

et al. 2022

jair

View full text Add to dashboard Cite

In this article, we aim to provide a literature review of different formulations and approaches to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We begin by discussing our perspective on why RL is a natural fit for studying continual learning. We then provide a taxonomy of different continual RL formulations by mathematically characterizing two key properties of non-stationarity, namely, the scope and driver non-stationarity. This offers a unified view of various formulations. Next, we review and present a taxonomy of continual RL approaches. We go on to discuss evaluation of continual RL agents, providing an overview of benchmarks used in the literature and important metrics for understanding agent performance. Finally, we highlight open problems and challenges in bridging the gap between the current state of continual RL and findings in neuroscience. While still in its early days, the study of continual RL has the promise to develop better incremental reinforcement learners that can function in increasingly realistic applications where non-stationarity plays a vital role. These include applications such as those in the fields of healthcare, education, logistics, and robotics.

show abstract

Introducing Symmetries to Black Box Meta Reinforcement Learning

Cited by 9 publications

References 14 publications

Goal-Conditioned Generators of Deep Policies

Goal-Conditioned Generators of Deep Policies

A Survey of Meta-Reinforcement Learning

Towards Continual Reinforcement Learning: A Review and Perspectives

Contact Info

Product

Resources

About