Sim-to-Real: Learning Agile Locomotion For Quadruped Robots

We present an Imitation Learning approach for the control of dynamical systems with a known model. Our policy search method is guided by solutions from Model Predictive Control (MPC). Contrary to approaches that minimize a distance metric between the guiding demonstrations and the learned policy, our loss function corresponds to the minimization of the control Hamiltonian, which derives from the principle of optimality. Our algorithm, therefore, directly attempts to solve the Hamilton-Jacobi-Bellman (HJB) optimality equation with a parameterized class of control laws. The loss function's explicit encoding of physical constraints manifests in an improved constraint satisfaction metric of the learned controller. We train a mixture-of-expert neural network architecture for controlling a quadrupedal robot and show that this policy structure is well suited for such multimodal systems. The learned policy can successfully stabilize different gaits on the real walking robot from less than 10 min of demonstration data.

Section: Introductionmentioning

confidence: 99%

MPC-Net: A First Principles Guided Policy Search

Carius

Farshidian

Hutter

2020

“…We compare MSO to two baselines: domain randomization (DR) [10] and strategy optimization with projected universal policy (SO-PUP) [11]. We run ARS for 1500 iterations for all methods and we use a two-dimensional latent space for MSO and SO-PUP.…”

Section: A Experiments Setupmentioning

confidence: 99%

“…The first task is to transfer the policy trained in simulation to the real Minitaur robot. Although we use the nonlinear actuator model from Tan et al [10], the reality gap in our case is still large as we use a different version of Minitaur and we do not perform additional system identification.…”

Section: B Adaptation Tasksmentioning

confidence: 99%

“…We show that MSO is extremely data efficient (≤ 15 rollouts or 75 seconds of data) to adapt the policies to novel situations in the target environment. We compare our method to two baseline methods: domain randomization [10] and strategy optimization with a projected universal policy [11]. Our results show that MSO outperforms both baselines in the simulated and the real environments.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Learning Fast Adaptation With Meta Strategy Optimization

Tan

Bai

et al. 2020

Self Cite

The ability to walk in new scenarios is a key milestone on the path toward real-world applications of legged robots. In this work, we introduce Meta Strategy Optimization, a meta-learning algorithm for training policies with latent variable inputs that can quickly adapt to new scenarios with a handful of trials in the target environment. The key idea behind MSO is to expose the same adaptation process, Strategy Optimization (SO), to both the training and testing phases. This allows MSO to effectively learn locomotion skills as well as a latent space that is suitable for fast adaptation. We evaluate our method on a real quadruped robot and demonstrate successful adaptation in various scenarios, including sim-to-real transfer, walking with a weakened motor, or climbing up a slope. Furthermore, we quantitatively analyze the generalization capability of the trained policy in simulated environments. Both real and simulated experiments show that our method outperforms previous methods in adaptation to novel tasks.

“…They can also significantly reduce the time and cost required for repair and maintenance. These features become especially important when testing learning algorithms directly on real hardware [33], [34] where it is essential to have a safe platform to explore various control patterns.…”

Section: Introductionmentioning

confidence: 99%

An Open Torque-Controlled Modular Robot Architecture for Legged Locomotion Research

Grimminger

Meduri

Khadiv

et al. 2020

160

117

We present a new open-source torque-controlled legged robot system, with a low cost and low complexity actuator module at its core. It consists of a low-weight high torque brushless DC motor and a low gear ratio transmission suitable for impedance and force control. We also present a novel foot contact sensor suitable for legged locomotion with hard impacts. A 2.2 kg quadruped robot with a large range of motion is assembled from 8 identical actuator modules and 4 lower legs with foot contact sensors. To the best of our knowledge, it is the most lightest force-controlled quadruped robot. We leverage standard plastic 3D printing and off-theshelf parts, resulting in light-weight and inexpensive robots, allowing for rapid distribution and duplication within the research community. In order to quantify the capabilities of our design, we systematically measure the achieved impedance at the foot in static and dynamic scenarios. We measured up to 10.8 dimensionless leg stiffness without active damping, which is comparable to the leg stiffness of a running human. Finally, in order to demonstrate the capabilities of our quadruped robot, we propose a novel controller which combines feedforward contact forces computed from a kino-dynamic optimizer with impedance control of the robot center of mass and base orientation. The controller is capable of regulating complex motions which are robust to environmental uncertainty. *