Learning to Walk Via Deep Reinforcement Learning

Haarnoja, Tuomas; Ha, Sehoon; Zhou, Aurick; Tan, Jie; Tucker, George; Levine, Sergey

doi:10.15607/rss.2019.xv.011

Cited by 270 publications

(238 citation statements)

References 45 publications

Supporting

Mentioning

237

Contrasting

Order By: Relevance

“…While SAC also uses weight of the entropy loss to encourage exploration, the implementation also includes automatic entropy scaling [24]. However, we found this automatic tuning to very quickly set the entropy weight very low, and agent was not able to improve after that point.…”

Section: Hyper-parametersmentioning

confidence: 96%

ToriLLE: Learning Environment for Hand-to-Hand Combat

Kanervisto

Hautamäki

2019

2019 IEEE Conference on Games (CoG)

View full text Add to dashboard Cite

We present Toribash Learning Environment (To-riLLE), a learning environment for machine learning agents based on the video game Toribash. Toribash is a MuJoCo-like environment of two humanoid characters fighting each other hand-to-hand, controlled by changing actuation modes of the joints. Competitive nature of Toribash as well its focused domain provide a platform for evaluating self-play methods, and evaluating machine learning agents against human players. In this paper we describe the environment with ToriLLE's capabilities and limitations, and experimentally show its applicability as a learning environment with baseline and human experiments. The source code of the environment and conducted experiments can be found at https://github.com/Miffyli/ToriLLE.

show abstract

Section: Hyper-parametersmentioning

confidence: 96%

ToriLLE: Learning Environment for Hand-to-Hand Combat

Kanervisto

Hautamäki

2019

2019 IEEE Conference on Games (CoG)

View full text Add to dashboard Cite

show abstract

“…[9] demonstrated that blind locomotion controllers could be transferred to real systems by incorporating actuator dynamics into offline training in simulation. Moreover, [10] showed that training was also possible directly on hardware. However, the aforementioned systems, crafted and trained end-to-end are limited to operating blind and on flat-terrain.…”

Section: Related Workmentioning

confidence: 99%

DeepGait: Planning and Control of Quadrupedal Gaits Using Deep Reinforcement Learning

Tsounis

Alge

Lee

et al. 2020

IEEE Robot. Autom. Lett.

146

View full text Add to dashboard Cite

This paper addresses the problem of legged locomotion in non-flat terrain. As legged robots such as quadrupeds are to be deployed in terrains with geometries which are difficult to model and predict, the need arises to equip them with the capability to generalize well to unforeseen situations. In this work, we propose a novel technique for training neuralnetwork policies for terrain-aware locomotion, which combines state-of-the-art methods for model-based motion planning and reinforcement learning. Our approach is centered on formulating Markov decision processes using the evaluation of dynamic feasibility criteria in place of physical simulation. We thus employ policy-gradient methods to independently train policies which respectively plan and execute foothold and base motions in 3D environments using both proprioceptive and exteroceptive measurements. We apply our method within a challenging suite of simulated terrain scenarios which contain features such as narrow bridges, gaps and stepping-stones, and train policies which succeed in locomoting effectively in all cases. * These authors contributed equally.

show abstract

“…Despite concerns about safety and sample complexity of DRL methods, there has been success in directly training locomotion controllers on the real robot [32], [33]. In Ha et al [33], a policy was directly trained on a multi-legged robot.…”

Section: Related Workmentioning

confidence: 99%

“…The training was automated by a novel resetting device which was able to re-initialize the robot during training after each rollout. In Haarnoja et al [32], a policy was trained for a real quadruped robot in under two hours from scratch using softactor critic algorithm [34]. Despite these success in learning legged locomotion tasks, directly training policies on a biped robot is still challenging due to the frequent manual resetting required during training and the potential safety concern from the inherent instability.…”

Section: Related Workmentioning

confidence: 99%

Sim-to-Real Transfer for Biped Locomotion

Visak

Turk

et al. 2019

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

View full text Add to dashboard Cite

We present a new approach for transfer of dynamic robot control policies such as biped locomotion from simulation to real hardware. Key to our approach is to perform system identification of the model parameters µ of the hardware (e.g. friction, center-of-mass) in two distinct stages, before policy learning (pre-sysID) and after policy learning (post-sysID). Pre-sysID begins by collecting trajectories from the physical hardware based on a set of generic motion sequences. Because the trajectories may not be related to the task of interest, pre-sysID does not attempt to accurately identify the true value of µ, but only to approximate the range of µ to guide the policy learning. Next, a Projected Universal Policy (PUP) is created by simultaneously training a network that projects µ to a low-dimensional latent variable η and a family of policies that are conditioned on η. The second round of system identification (post-sysID) is then carried out by deploying the PUP on the robot hardware using task-relevant trajectories. We use Bayesian Optimization to determine the values for η that optimize the performance of PUP on the real hardware. We have used this approach to create three successful biped locomotion controllers (walk forward, walk backwards, walk sideways) on the Darwin OP2 robot.

show abstract

Learning to Walk Via Deep Reinforcement Learning

Cited by 270 publications

References 45 publications

ToriLLE: Learning Environment for Hand-to-Hand Combat

ToriLLE: Learning Environment for Hand-to-Hand Combat

DeepGait: Planning and Control of Quadrupedal Gaits Using Deep Reinforcement Learning

Sim-to-Real Transfer for Biped Locomotion

Contact Info

Product

Resources

About