Advanced Skills by Learning Locomotion and Local Navigation End-to-End

Rudin, Nikita; Hoeller, David; Bjelonic, Marko; Hutter, Marco

doi:10.1109/iros47612.2022.9981198

Cited by 35 publications

(27 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…By using a specialized policy, ANYmal crossed a 0.6-m-wide gap within a premapped environment (14). Most notably, our locomotion controller, not being specialized or fine-tuned for this terrain type, crossed a sequence of four gaps with the same width while relying on online generated maps only.…”

Section: Benchmark Against Rl Controlmentioning

confidence: 99%

“…The locomotion policy π(a | o) is a stochastic distribution of actions a ∈  that are conditioned on observations o ∈ , parametrized by an MLP. The action space comprises target joint positions that are tracked using a proportional-derivative (PD) controller, following the approach in (10) and related works (12)(13)(14).…”

Section: Overview Of the Training Environmentmentioning

confidence: 99%

“…Although locomotion across gaps and stepping stones is theoretically possible, good exploration strategies are required to learn from the emerging sparse reward signals. So far, these terrains could only be handled by specialized policies, which intentionally overfit to one particular scenario (14) or a selection of similar terrain types (15)(16)(17)(18). Despite promising results, distilling a unifying locomotion policy may be difficult and has only been shown with limited success (19).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

DTC: Deep Tracking Control

Jenelten,

He,

Farshidian

et al. 2024

Sci. Robot.

Self Cite

View full text Add to dashboard Cite

Legged locomotion is a complex control problem that requires both accuracy and robustness to cope with real-world challenges. Legged systems have traditionally been controlled using trajectory optimization with inverse dynamics. Such hierarchical model-based methods are appealing because of intuitive cost function tuning, accurate planning, generalization, and, most importantly, the insightful understanding gained from more than one decade of extensive research. However, model mismatch and violation of assumptions are common sources of faulty operation. Simulation-based reinforcement learning, on the other hand, results in locomotion policies with unprecedented robustness and recovery skills. Yet, all learning algorithms struggle with sparse rewards emerging from environments where valid footholds are rare, such as gaps or stepping stones. In this work, we propose a hybrid control architecture that combines the advantages of both worlds to simultaneously achieve greater robustness, foot-placement accuracy, and terrain generalization. Our approach uses a model-based planner to roll out a reference motion during training. A deep neural network policy is trained in simulation, aiming to track the optimized footholds. We evaluated the accuracy of our locomotion pipeline on sparse terrains, where pure data-driven methods are prone to fail. Furthermore, we demonstrate superior robustness in the presence of slippery or deformable ground when compared with model-based counterparts. Last, we show that our proposed tracking controller generalizes across different trajectory optimization methods not seen during training. In conclusion, our work unites the predictive capabilities and optimality guarantees of online planning with the inherent robustness attributed to offline learning.

show abstract

Section: Benchmark Against Rl Controlmentioning

confidence: 99%

Section: Overview Of the Training Environmentmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

DTC: Deep Tracking Control

Jenelten,

He,

Farshidian

et al. 2024

Sci. Robot.

Self Cite

View full text Add to dashboard Cite

show abstract

“…In addition, the agent is rewarded at the end of the episode for standing up in a configuration close to ALMA's default stance pose. We define the fall and recovery problem as a finite-horizon MDP with time-based rewards similar to Rudin et al [20], where time-variant task rewards are used to train efficient and adaptive locomotion skills on diverse terrains. The rewards that regularize the robot's undesirable behaviors, such as joint acceleration penalty and high impact, are timeinvariant and active throughout the episode.…”

Section: Reward Functionmentioning

confidence: 99%

Learning Arm-Assisted Fall Damage Reduction and Recovery for Legged Mobile Manipulators

Ma¹,

Farshidian²,

Hutter³

2023

Preprint

View full text Add to dashboard Cite

Adaptive falling and recovery skills greatly extend the applicability of robot deployments. In the case of legged mobile manipulators, the robot arm could adaptively stop the fall and assist the recovery. Prior works on falling and recovery strategies for legged mobile manipulators usually rely on assumptions such as inelastic collisions and falling in defined directions to enable real-time computation. This paper presents a learning-based approach to reducing fall damage and recovery. An asymmetric actor-critic training structure is used to train a time-invariant policy with time-varying reward functions. In simulated experiments, the policy recovers from 98.9% of initial falling configurations. It reduces base contact impulse, peak joint internal forces, and base acceleration during the fall compared to the baseline methods. The trained control policy is deployed and extensively tested on the ALMA robot hardware. A video summarizing the proposed method and the hardware tests is available at https://youtu.be/avwg2HqGi8s.

show abstract

“…However, one disadvantage of model-free RL is that it typically involves an inefficient trial-and-error process, which leads to long training times before attaining satisfactory performance. So rather than relying on real-world experience, simulators are often used to generate realistic training data efficiently (34). When combined with strategies that mitigate the sim-to-real gap (35), this can enable reliable transfer to hardware.…”

Section: Introductionmentioning

confidence: 99%

Versatile multicontact planning and control for legged loco-manipulation

2023

View full text Add to dashboard Cite

Loco-manipulation planning skills are pivotal for expanding the utility of robots in everyday environments. These skills can be assessed on the basis of a system’s ability to coordinate complex holistic movements and multiple contact interactions when solving different tasks. However, existing approaches have been merely able to shape such behaviors with hand-crafted state machines, densely engineered rewards, or prerecorded expert demonstrations. Here, we propose a minimally guided framework that automatically discovers whole-body trajectories jointly with contact schedules for solving general loco-manipulation tasks in premodeled environments. The key insight is that multimodal problems of this nature can be formulated and treated within the context of integrated task and motion planning (TAMP). An effective bilevel search strategy was achieved by incorporating domain-specific rules and adequately combining the strengths of different planning techniques: trajectory optimization and informed graph search coupled with sampling-based planning. We showcase emergent behaviors for a quadrupedal mobile manipulator exploiting both prehensile and nonprehensile interactions to perform real-world tasks such as opening/closing heavy dishwashers and traversing spring-loaded doors. These behaviors were also deployed on the real system using a two-layer whole-body tracking controller.

show abstract

Advanced Skills by Learning Locomotion and Local Navigation End-to-End

Cited by 35 publications

References 25 publications

DTC: Deep Tracking Control

DTC: Deep Tracking Control

Learning Arm-Assisted Fall Damage Reduction and Recovery for Legged Mobile Manipulators

Versatile multicontact planning and control for legged loco-manipulation

Contact Info

Product

Resources

About