2021
DOI: 10.48550/arxiv.2110.15191
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

URLB: Unsupervised Reinforcement Learning Benchmark

Abstract: Deep Reinforcement Learning (RL) has emerged as a powerful paradigm to solve a range of complex yet specific control tasks. Yet training generalist agents that can quickly adapt to new tasks remains an outstanding challenge. Recent advances in unsupervised RL have shown that pre-training RL agents with self-supervised intrinsic rewards can result in efficient adaptation. However, these algorithms have been hard to compare and develop due to the lack of a unified benchmark. To this end, we introduce the Unsuper… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 20 publications
(49 reference statements)
1
4
0
Order By: Relevance
“…However, reward binds the agent to a certain task for which the reward represents success. Aligned with the recent surge of interest in unsupervised methods in reinforcement learning (Baranes and Oudeyer, 2013;Bellemare et al, 2016;Gregor et al, 2016;Houthooft et al, 2016;Gupta et al, 2018;Hausman et al, 2018;Pong et al, 2019;Laskin et al, 2020Laskin et al, , 2021He et al, 2021) and previously proposed ideas (Schmidhuber, 1991a(Schmidhuber, , 2010, we argue that there exist properties of a dynamical system which are not tied to any particular task, yet highly useful, leveraging them can help solve other tasks more efficiently. This work focuses on the sensitivity of the produced trajectories of the system with respect to the policy so-called Physical Derivatives.…”
Section: Introductionsupporting
confidence: 55%
“…However, reward binds the agent to a certain task for which the reward represents success. Aligned with the recent surge of interest in unsupervised methods in reinforcement learning (Baranes and Oudeyer, 2013;Bellemare et al, 2016;Gregor et al, 2016;Houthooft et al, 2016;Gupta et al, 2018;Hausman et al, 2018;Pong et al, 2019;Laskin et al, 2020Laskin et al, , 2021He et al, 2021) and previously proposed ideas (Schmidhuber, 1991a(Schmidhuber, , 2010, we argue that there exist properties of a dynamical system which are not tied to any particular task, yet highly useful, leveraging them can help solve other tasks more efficiently. This work focuses on the sensitivity of the produced trajectories of the system with respect to the policy so-called Physical Derivatives.…”
Section: Introductionsupporting
confidence: 55%
“…Competence-based models like DIAYN (Eysenbach et al 2019), SMM (Lee et al 2019), and APS (Liu and Abbeel 2021a) encourage agents to learn diverse skills by leveraging prior information. However, all of these methods were originally designed for online pretraining and fine-tuning (Laskin et al 2021), not tailored for data collection. In contrast, CUDC is a novel method that gradually expands the feature space by exploiting reachability into more distant future states, rather than a fixed temporal distance.…”
Section: Related Workmentioning
confidence: 99%
“…In particular, Yarats et al [7] creates a dataset of pre-collected trajectories, ExoRL, on the DeepMind control suite [29] generated without any hand-crafted rewards. Similar to URLB [30], ExoRL benchmarks a number of exploration algorithms [3,6,31,5], and evaluates the performance of a policy trained on the corresponding offline datasets relabeled with task-specific rewards.…”
Section: Related Workmentioning
confidence: 99%