2020
DOI: 10.48550/arxiv.2003.08536
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions

Rui Wang,
Joel Lehman,
Aditya Rawal
et al.

Abstract: Creating open-ended algorithms, which generate their own never-ending stream of novel and appropriately challenging learning opportunities, could help to automate and accelerate progress in machine learning. A recent step in this direction is the Paired Open-Ended Trailblazer (POET), an algorithm that generates and solves its own challenges, and allows solutions to goal-switch between challenges to avoid local optima. However, the original POET was unable to demonstrate its full creative potential because of l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
1
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 43 publications
0
8
0
Order By: Relevance
“…Previous work from Domain Randomization [31] is not applicable to learned simulators because they often do not have easily configurable parameters. Future direction for this work could be modifying the dynamics model parameters in a targeted manner [28,33,34]. This simple approach to generating different versions of a model could also be useful in committee-based methods [25,26].…”
Section: Discussionmentioning
confidence: 99%
“…Previous work from Domain Randomization [31] is not applicable to learned simulators because they often do not have easily configurable parameters. Future direction for this work could be modifying the dynamics model parameters in a targeted manner [28,33,34]. This simple approach to generating different versions of a model could also be useful in committee-based methods [25,26].…”
Section: Discussionmentioning
confidence: 99%
“…To create a curriculum, the threshold is linearly increased over the course of training. POET [45,46] uses a population of minimax (rather than minimax regret) adversaries to generate the terrain for a 2D walker. However, POET [45] requires generating many new environments, testing all agents within each one, and discarding environments based on a manually chosen reward threshold, which wastes a significant amount of computation.…”
Section: Related Workmentioning
confidence: 99%
“…Prior work [45,46] focused on demonstrating emergent complexity as the primary goal, arguing that automatically learning complex behaviors is key to improving the sophistication of AI agents. Here, we track the complexity of the generated environments and learned behaviors throughout training.…”
Section: Emergent Complexitymentioning
confidence: 99%
See 1 more Smart Citation
“…In part because of their natural ability to deal with those issues, Quality-Diversity [1] (QD) methods have garnered considerable interest in the past few years, and have been applied to many problems, ranging from robotics [2], [3], [4], [5], [6], [7] to video games [8], [9], [10] and open-ended learning [11], [12]. The aim of these methods is to obtain a diverse set of well-performing solutions to a given problem.…”
Section: Introductionmentioning
confidence: 99%