2020
DOI: 10.48550/arxiv.2002.08550
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning to Walk in the Real World with Minimal Human Effort

Sehoon Ha,
Peng Xu,
Zhenyu Tan
et al.

Abstract: Reliable and stable locomotion has been one of the most fundamental challenges for legged robots. Deep reinforcement learning (deep RL) has emerged as a promising method for developing such control policies autonomously. In this paper, we develop a system for learning legged locomotion policies with deep RL in the real world with minimal human effort.The key difficulties for on-robot learning systems are automatic data collection and safety. We overcome these two challenges by developing a multi-task learning … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
39
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
6
3

Relationship

4
5

Authors

Journals

citations
Cited by 28 publications
(39 citation statements)
references
References 32 publications
(46 reference statements)
0
39
0
Order By: Relevance
“…] is an entropy of the stochastic policy 𝜋 and 𝛼 ≥ 0 is an entropy temperature. Furthermore, we use the Lagrangian relaxation method [22], [23], [24] to solve the 𝜏-CMDP problem.…”
Section: Dmentioning
confidence: 99%
“…] is an entropy of the stochastic policy 𝜋 and 𝛼 ≥ 0 is an entropy temperature. Furthermore, we use the Lagrangian relaxation method [22], [23], [24] to solve the 𝜏-CMDP problem.…”
Section: Dmentioning
confidence: 99%
“…Reset-free RL has been studied by previous works with a focus on safety (Eysenbach et al, 2017), automated and unattended learning in the real world (Han et al, 2015;Zhu et al, 2020;, skill discovery Lu et al, 2020), and providing a curriculum (Sharma et al, 2021). Strategies to learn reset-free behavior include directly learning a backward reset controller (Eysenbach et al, 2017), learning a set of auxillary tasks that can serve as an approximate reset (Ha et al, 2020;, or using a novelty seeking reset controller (Zhu et al, 2020). Complementary to this literature, we aim to develop a set of benchmarks and a framework that allows for this class of algorithms to be studied in a unified way.…”
Section: Related Workmentioning
confidence: 99%
“…The learned policy has been successfully deployed on real robots by using domain randomization (DR) [3], system identification [5], or real-world adaptation [18]. Alternatively, researchers [4], [19] have investigated learning policies directly from realworld experience, which can intrinsically overcome sim-toreal gaps. Sample efficiency is a critical challenge for deep RL approaches, which can be improved by leveraging modelbased control strategies [17], [20], [21].…”
Section: Related Workmentioning
confidence: 99%