Gotta Learn Fast: A New Benchmark for Generalization in RL

Nichol, Alex; Pfau, Vicki; Hesse, Christopher; Климов, О. В.; John, Sabu

doi:10.48550/arxiv.1804.03720

Cited by 54 publications

(64 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Exploration for Procedurally-Generated Environments. Several recent studies have discussed the generalization of reinforcement learning (Rajeswaran et al, 2017;Zhang et al, 2018a;b;Choi et al, 2018) and designed procedurally-generated environments to test the generalization of reinforcement learning (Beattie et al, 2016;Nichol et al, 2018;. More recent papers show that traditional exploration methods fall short in procedurally-generated environments and address this issue with new exploration methods Campero et al, 2020).…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments

Zha

Yuan

et al. 2021

Preprint

View full text Add to dashboard Cite

Exploration under sparse reward is a long-standing challenge of model-free reinforcement learning. The state-of-the-art methods address this challenge by introducing intrinsic rewards to encourage exploration in novel states or uncertain environment dynamics. Unfortunately, methods based on intrinsic rewards often fall short in procedurally-generated environments, where a different environment is generated in each episode so that the agent is not likely to visit the same state more than once. Motivated by how humans distinguish good exploration behaviors by looking into the entire episode, we introduce RAPID, a simple yet effective episode-level exploration method for procedurally-generated environments. RAPID regards each episode as a whole and gives an episodic exploration score from both per-episode and long-term views. Those highly scored episodes are treated as good exploration behaviors and are stored in a small ranking buffer. The agent then imitates the episodes in the buffer to reproduce the past good exploration behaviors. We demonstrate our method on several procedurally-generated MiniGrid environments, a first-person-view 3D Maze navigation task from MiniWorld, and several sparse MuJoCo tasks. The results show that RAPID significantly outperforms the state-of-the-art intrinsic reward strategies in terms of sample efficiency and final performance. The code is available at https://github.com/daochenzha/rapid.

show abstract

Section: Related Workmentioning

confidence: 99%

“…are designed to test the generalization of RL, such as (Beattie et al, 2016;Nichol et al, 2018;Côté et al, 2018;Cobbe et al, 2019;, in which the agent aims to solve the same task, but a different environment is generated in each episode.…”

Section: Introductionmentioning

confidence: 99%

Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments

Zha

Yuan

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Related to our approach is a strand of literature that assumes there exists a distribution of Markov-decision-problems of the scenario of interest, and then trains algorithms on a finite set of samples from this distribution before testing the behavior on the entire distribution (e.g. Zhang et al, 2018a;Nichol et al, 2018;Justesen et al, 2018).…”

Section: Introductionmentioning

confidence: 99%

Robust Algorithmic Collusion

Eschenbaum¹,

Mellgren²,

Zahn³

2022

Preprint

View full text Add to dashboard Cite

This paper develops a formal framework to assess policies of learning algorithms in economic games. We investigate whether reinforcementlearning agents with collusive pricing policies can successfully extrapolate collusive behavior from training to the market. We find that in testing environments collusion consistently breaks down. Instead, we observe static Nash play. We then show that restricting algorithms' strategy space can make algorithmic collusion robust, because it limits overfitting to rival strategies. Our findings suggest that policy-makers should focus on firm behavior aimed at coordinating algorithm design in order to make collusive policies robust.

show abstract

“…Model-free RL, like model-based RL, has also suffered from both the "train=test" paradigm and a lack of standardization around how to measure generalization. In response, recent papers have discussed what generalization in RL means and how to measure it [7,8,36,49,71], and others have proposed new environments such as Procgen [9] and Meta-World [74] as benchmarks focusing on measuring generalization. While popular in the model-free community [e.g.…”

Section: Introductionmentioning

confidence: 99%

Procedural Generalization by Planning with Self-Supervised World Models

Anand¹,

Walker²,

Li³

et al. 2021

Preprint

View full text Add to dashboard Cite

One of the key promises of model-based reinforcement learning is the ability to generalize using an internal model of the world to make predictions in novel environments and tasks. However, the generalization ability of model-based agents is not well understood because existing work has focused on model-free agents when benchmarking generalization. Here, we explicitly measure the generalization ability of model-based agents in comparison to their model-free counterparts. We focus our analysis on MuZero [60], a powerful model-based agent, and evaluate its performance on both procedural and task generalization. We identify three factors of procedural generalization-planning, self-supervised representation learning, and procedural data diversity-and show that by combining these techniques, we achieve state-of-the art generalization performance and data efficiency on Procgen [9]. However, we find that these factors do not always provide the same benefits for the task generalization benchmarks in Meta-World [74], indicating that transfer remains a challenge and may require different approaches than procedural generalization. Overall, we suggest that building generalizable agents requires moving beyond the single-task, model-free paradigm and towards self-supervised model-based agents that are trained in rich, procedural, multi-task environments.

show abstract

Gotta Learn Fast: A New Benchmark for Generalization in RL

Cited by 54 publications

References 16 publications

Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments

Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments

Robust Algorithmic Collusion

Procedural Generalization by Planning with Self-Supervised World Models

Contact Info

Product

Resources

About