Robotics: Science and Systems XV 2019
DOI: 10.15607/rss.2019.xv.011
|View full text |Cite
|
Sign up to set email alerts
|

Learning to Walk Via Deep Reinforcement Learning

Abstract: Deep reinforcement learning (deep RL) holds the promise of automating the acquisition of complex controllers that can map sensory inputs directly to low-level actions. In the domain of robotic locomotion, deep RL could enable learning locomotion skills with minimal engineering and without an explicit model of the robot dynamics. Unfortunately, applying deep RL to real-world robotic tasks is exceptionally difficult, primarily due to poor sample complexity and sensitivity to hyperparameters. While hyperparameter… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
237
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 270 publications
(238 citation statements)
references
References 45 publications
1
237
0
Order By: Relevance
“…While SAC also uses weight of the entropy loss to encourage exploration, the implementation also includes automatic entropy scaling [24]. However, we found this automatic tuning to very quickly set the entropy weight very low, and agent was not able to improve after that point.…”
Section: Hyper-parametersmentioning
confidence: 96%
“…While SAC also uses weight of the entropy loss to encourage exploration, the implementation also includes automatic entropy scaling [24]. However, we found this automatic tuning to very quickly set the entropy weight very low, and agent was not able to improve after that point.…”
Section: Hyper-parametersmentioning
confidence: 96%
“…[9] demonstrated that blind locomotion controllers could be transferred to real systems by incorporating actuator dynamics into offline training in simulation. Moreover, [10] showed that training was also possible directly on hardware. However, the aforementioned systems, crafted and trained end-to-end are limited to operating blind and on flat-terrain.…”
Section: Related Workmentioning
confidence: 99%
“…Despite concerns about safety and sample complexity of DRL methods, there has been success in directly training locomotion controllers on the real robot [32], [33]. In Ha et al [33], a policy was directly trained on a multi-legged robot.…”
Section: Related Workmentioning
confidence: 99%
“…The training was automated by a novel resetting device which was able to re-initialize the robot during training after each rollout. In Haarnoja et al [32], a policy was trained for a real quadruped robot in under two hours from scratch using softactor critic algorithm [34]. Despite these success in learning legged locomotion tasks, directly training policies on a biped robot is still challenging due to the frequent manual resetting required during training and the potential safety concern from the inherent instability.…”
Section: Related Workmentioning
confidence: 99%