2022
DOI: 10.1609/aaai.v36i7.20681
|View full text |Cite
|
Sign up to set email alerts
|

Introducing Symmetries to Black Box Meta Reinforcement Learning

Abstract: Meta reinforcement learning (RL) attempts to discover new RL algorithms automatically from environment interaction. In so-called black-box approaches, the policy and the learning algorithm are jointly represented by a single neural network. These methods are very flexible, but they tend to underperform compared to human-engineered RL algorithms in terms of generalisation to new, unseen environments. In this paper, we explore the role of symmetries in meta-generalisation. We show that a recent successful meta R… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 14 publications
(26 reference statements)
0
7
0
Order By: Relevance
“…Linear transformers are a type of FWP where information is stored through outer products of keys and values (Schlag et al, 2021;Schmidhuber, 1992). FWPs are used in the context of memory-based meta learning (Schmidhuber, 1993;Miconi et al, 2018;Gregor, 2020;Kirsch and Schmidhuber, 2021;Irie et al, 2021;Kirsch et al, 2022), predicting parameters for varying architectures (Knyazev et al, 2021), and reinforcement learning (Gomez and Schmidhuber, 2005;Najarro and Risi, 2020;Kirsch et al, 2022). In contrast to all of these approaches, ours uses FWPs to generate policies conditioning on a command (target return).…”
Section: Related Workmentioning
confidence: 99%
“…Linear transformers are a type of FWP where information is stored through outer products of keys and values (Schlag et al, 2021;Schmidhuber, 1992). FWPs are used in the context of memory-based meta learning (Schmidhuber, 1993;Miconi et al, 2018;Gregor, 2020;Kirsch and Schmidhuber, 2021;Irie et al, 2021;Kirsch et al, 2022), predicting parameters for varying architectures (Knyazev et al, 2021), and reinforcement learning (Gomez and Schmidhuber, 2005;Najarro and Risi, 2020;Kirsch et al, 2022). In contrast to all of these approaches, ours uses FWPs to generate policies conditioning on a command (target return).…”
Section: Related Workmentioning
confidence: 99%
“…The reward function of an MDP defines the task we want the agent to solve. However, the task-defining rewards may be challenging to learn from because maximizing Xu et al [254] Black-box Kirsch et al [108] Table 5: Many-shot meta-RL methods categorized by the task distribution considered and metaparametrization.…”
Section: Learning Intrinsic Rewardsmentioning
confidence: 99%
“…Black-box meta-learning In few-shot meta-RL, black-box methods that use RNNs or other neural networks instead of stochastic gradient descent (SGD) tend to learn faster than the SGD-based alternatives. Kirsch et al [108] argue that many black-box meta-RL approaches, e.g., [46,239] cannot generalize well to unseen environments because they can easily overfit to the training environments. To combat overfitting, they introduce a specialized RNN architecture, which reuses the same RNN cell multiple times, making the RNN weights agnostic to the input and output dimensions and permutations.…”
Section: Auxiliary Tasksmentioning
confidence: 99%
See 1 more Smart Citation
“…There have been a number of recent improvements on this topic of meta-optimization in multi-task RL Kim, Yoon, Dia, Kim, Bengio, and Ahn, 2018;Rothfuss, Lee, Clavera, Asfour, and Abbeel, 2018;Flennerhag, Moreno, Lawrence, and Damianou, 2018;Nagabandi, Finn, and Levine, 2018;Mendonca, Gupta, Kralev, Abbeel, Levine, and Finn, 2019;Finn, Rajeswaran, Kakade, and Levine, 2019;Lin, Thomas, Yang, and Ma, 2020;Berseth, Zhang, Zhang, Finn, and Levine, 2021;Co-Reyes, Miao, Peng, Real, Levine, Le, Lee, and Faust, 2021b;Kirsch, Flennerhag, van Hasselt, Friesen, Oh, and Chen, 2022;Wan, Peng, and Gangwani, 2022;Melo, 2022;Nam, Sun, Pertsch, Hwang, and Lim, 2022) and multi-agent RL (Foerster et al, 2018a,1;Kim et al, 2021a;Al-Shedivat, Bansal, Burda, Sutskever, Mordatch, and Abbeel, 2017). Moreover, another group of recent approaches focuses on learning a meta-critic, which explicitly guides updates to the agent's policy rather than simply guiding its actions (Harb et al, 2020;Sung, Zhang, Xiang, Hospedales, and Yang, 2017;Xu, Cao, and Chen, 2019).…”
Section: Learning To Adaptmentioning
confidence: 99%