Alchemy: A benchmark and analysis toolkit for meta-reinforcement learning agents

Wang, Jane X.; King, Michael; Porcel, Nicolas; Kurth‐Nelson, Zeb; Zhu, Tina; Deck, Charlie; Choy, Peter; Mary, Cassin,; Reynolds, Malcolm; Song, Francis; Buttimore, Gavin; Reichert, David P.; Rabinowitz, Neil C.; Matthey, Löıc; Hassabis, Demis; Lerchner, Alexander; Botvinick, Matthew

doi:10.48550/arxiv.2102.02926

Cited by 5 publications

(6 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, Tosch et al [83] presents three highly parametrisable versions of Atari games, and uses them to perform post hoc analysis of agents trained on a single variant. Some environments are not targeted at zero-shot policy transfer (CausalWorld, RWRL, RLBench, Alchemy, Meta-world [51,81,78,45,68]), but could be adapted to such a scenario with a different evaluation protocol. More generally, all environments provide a context set, and many then propose specific evaluation protocols, but other protocols could be used as long as they were well-justified.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

A Survey of Zero-shot Generalisation in Deep Reinforcement Learning

Kirk¹,

Zhang²,

Grefenstette³

et al. 2021

Preprint

View full text Add to dashboard Cite

The study of generalisation in deep Reinforcement Learning (RL) aims to produce RL algorithms whose policies generalise well to novel unseen situations at deployment time, avoiding overfitting to their training environments. Tackling this is vital if we are to deploy reinforcement learning algorithms in real world scenarios, where the environment will be diverse, dynamic and unpredictable. This survey is an overview of this nascent field. We provide a unifying formalism and terminology for discussing different generalisation problems, building upon previous works. We go on to categorise existing benchmarks for generalisation, as well as current methods for tackling the generalisation problem. Finally, we provide a critical discussion of the current state of the field, including recommendations for future work. Among other conclusions, we argue that taking a purely procedural content generation approach to benchmark design is not conducive to progress in generalisation, we suggest fast online adaptation and tackling RL-specific problems as some areas for future work on methods for generalisation, and we recommend building benchmarks in underexplored problem settings such as offline RL generalisation and reward-function variation.

show abstract

Section: Discussionmentioning

confidence: 99%

“…Protocol C is commonly used among PCG environments that are not explicitly targeted at generalisation (MiniGrid, NLE, MiniHack, Alchemy [70,73,71,45]). The testing context set consists of seeds held out from the training set, and otherwise during training the full context set is used.…”

Section: Evaluation Protocols For Generalisationmentioning

confidence: 99%

A Survey of Zero-shot Generalisation in Deep Reinforcement Learning

Kirk¹,

Zhang²,

Grefenstette³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…However, a more robust benchmark could include the aforementioned change points in order to further control the complexity. The CT-graph, Meta-world, and the recently developed Alchemy (Wang et al, 2021) environment are examples of benchmarks with early stage work in this direction, albeit implicitly. Therefore, the development of a precise measure of task similarity and complexity, as well as robust benchmarks with configurable change points (i.e., reward, state/input, and transition) would be highly beneficial to the meta-RL field.…”

Section: Discussionmentioning

confidence: 99%

Context Meta-Reinforcement Learning via Neuromodulation

Ben-Iwhiwhu,

Dick,

Ketz

et al. 2021

Preprint

View full text Add to dashboard Cite

Meta-reinforcement learning (meta-RL) algorithms enable agents to adapt quickly to tasks from few samples in dynamic environments. Such a feat is achieved through dynamic representations in an agent's policy network (obtained via reasoning about task context, model parameter updates, or both). However, obtaining rich dynamic representations for fast adaptation beyond simple benchmark problems is challenging due to the burden placed on the policy network to accommodate different policies. This paper addresses the challenge by introducing neuromodulation as a modular component to augment a standard policy network that regulates neuronal activities in order to produce efficient dynamic representations for task adaptation. The proposed extension to the policy network is evaluated across multiple discrete and continuous control environments of increasing complexity. To prove the generality and benefits of the extension in meta-RL, the neuromodulated network was applied to two state-of-the-art meta-RL algorithms (CAVIA and PEARL). The result demonstrates that meta-RL augmented with neuromodulation produces significantly better result and richer dynamic representations in comparison to the baselines.

show abstract

“…Popular crafting games, such as Minecraft and Little Alchemy, have inspired research on autonomous exploration in people (Brändle et al, 2023) and artificial agents (G. Wang et al, 2023). Crafting games are also widely used for designing benchmarks for human-like generalization and reasoning (Hafner, 2022;J. X. Wang et al, 2021).…”

Section: Crafting Gamesmentioning

confidence: 99%

A rational model of innovation by recombination

Zhao,

Vélez,

Griffiths

2024

Preprint

View full text Add to dashboard Cite

Human learning does not stop at solving a single problem. Instead, we seek new challenges, define new goals, and come up with new ideas. What drives people to disrupt the existing conceptual landscape and create new things? Here, we examine the decision to create new things under different levels of potential returns. We formalize innovation as stochastically recombining existing ideas, where successful and more complex combinations generate higher returns. This formalization allows us to cast innovation-seeking as a Markov decision process, and derive optimal policies under different settings. Data collected through an online behavioral experiment confirm our prediction that people should invest more time and effort in seeking innovations when they know the chances of success are high and the potential new ideas would be rewarding. However, people also deviate from being optimal, both innovating more and less than they should in different settings.

show abstract

Alchemy: A benchmark and analysis toolkit for meta-reinforcement learning agents

Cited by 5 publications

References 0 publications

A Survey of Zero-shot Generalisation in Deep Reinforcement Learning

A Survey of Zero-shot Generalisation in Deep Reinforcement Learning

Context Meta-Reinforcement Learning via Neuromodulation

A rational model of innovation by recombination

Contact Info

Product

Resources

About