2021
DOI: 10.48550/arxiv.2107.12931
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Autonomous Reinforcement Learning via Subgoal Curricula

Abstract: Reinforcement learning (RL) promises to enable autonomous acquisition of complex behaviors for diverse agents. However, the success of current reinforcement learning algorithms is predicated on an often under-emphasised requirement -each trial needs to start from a fixed initial state distribution. Unfortunately, resetting the environment to its initial state after each trial requires substantial amount of human supervision and extensive instrumentation of the environment which defeats the purpose of autonomou… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(8 citation statements)
references
References 31 publications
(59 reference statements)
0
8
0
Order By: Relevance
“…We evaluate forward-backward RL (FBRL) (Han et al, 2015;Eysenbach et al, 2017), a perturbation controller (R3L) (Zhu et al, 2020), value-accelerated persistent RL (VaPRL) (Sharma et al, 2021), a comparison to simply running the base RL algorithm with the biased TD update discussed in Section 6.1 (naïve RL), and finally an oracle (oracle RL) where resets are provided are provided every H E steps (H T is typically three orders of magnitude larger than H E ). We benchmark VaPRL only when demonstrations are available, in accordance to the proposed algorithm in Sharma et al (2021). We average the performance of all algorithms across 5 random seeds.…”
Section: Evaluation: Setup Metrics Baselines and Resultsmentioning
confidence: 99%
See 4 more Smart Citations
“…We evaluate forward-backward RL (FBRL) (Han et al, 2015;Eysenbach et al, 2017), a perturbation controller (R3L) (Zhu et al, 2020), value-accelerated persistent RL (VaPRL) (Sharma et al, 2021), a comparison to simply running the base RL algorithm with the biased TD update discussed in Section 6.1 (naïve RL), and finally an oracle (oracle RL) where resets are provided are provided every H E steps (H T is typically three orders of magnitude larger than H E ). We benchmark VaPRL only when demonstrations are available, in accordance to the proposed algorithm in Sharma et al (2021). We average the performance of all algorithms across 5 random seeds.…”
Section: Evaluation: Setup Metrics Baselines and Resultsmentioning
confidence: 99%
“…Reset-free RL has been studied by previous works with a focus on safety (Eysenbach et al, 2017), automated and unattended learning in the real world (Han et al, 2015;Zhu et al, 2020;, skill discovery Lu et al, 2020), and providing a curriculum (Sharma et al, 2021). Strategies to learn reset-free behavior include directly learning a backward reset controller (Eysenbach et al, 2017), learning a set of auxillary tasks that can serve as an approximate reset (Ha et al, 2020;, or using a novelty seeking reset controller (Zhu et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations