DeepQ Stepper: A framework for reactive dynamic walking on uneven terrain

Meduri, Avadesh; Khadiv, Majid; Righetti, Ludovic

doi:10.48550/arxiv.2010.14834

Cited by 2 publications

(3 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This limits our ability to deal with more challenging terrain, e.g., more challenging variations of the stepping stone task. Incorporating systems [36,44] that directly learn foot placement will be crucial for further improving the capability and robustness of our system.…”

Section: Discussion and Future Workmentioning

confidence: 99%

See 1 more Smart Citation

GLiDE: Generalizable Quadrupedal Locomotion in Diverse Environments with a Centroidal Model

Xie¹,

Da²,

Babich³

et al. 2021

Preprint

View full text Add to dashboard Cite

Model-free reinforcement learning (RL) for legged locomotion commonly relies on a physics simulator that can accurately predict the behaviors of every degree of freedom of the robot. In contrast, approximate reduced-order models are often sufficient for many model-based control strategies. In this work we explore how RL can be effectively used with a centroidal model to generate robust control policies for quadrupedal locomotion. Advantages over RL with a full-order model include a simple reward structure, reduced computational costs, and robust sim-to-real transfer. We further show the potential of the method by demonstrating stepping-stone locomotion, twolegged in-place balance, balance beam locomotion, and sim-toreal transfer without further adaptations. Additional Results: https://www.pair.toronto.edu/glide-quadruped/.

show abstract

Section: Discussion and Future Workmentioning

confidence: 99%

“…Heuristics or dynamics can be learned to enable faster optimization or planning, e.g., [18,33]. Hierarchical control structures have been proposed to allow model-based control and model-free RL policies to operate at different time scales to leverage their respective advantages, e.g., [4,34,35,36].…”

Section: Combination Of Model-based Control and Learningmentioning

confidence: 99%

GLiDE: Generalizable Quadrupedal Locomotion in Diverse Environments with a Centroidal Model

Xie¹,

Da²,

Babich³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The penalty is then seen as a design choice controlling the risk of optimal controllers [21], or as a parameter that must be optimized over to allow for the least conservative safe behaviour [22], [23]. In practice, the penalty is often treated as a heuristic [24]- [27]. From a theoretical point of view, it is unclear whether optimal controllers of penalized problems enjoy safety guarantees, and whether they are optimal with respect to the original constrained task [2], [28].…”

Section: A Related Workmentioning

confidence: 99%

Safe Value Functions

Massiani¹,

Heim²,

Solowjow³

et al. 2021

Preprint

View full text Add to dashboard Cite

The relationship between safety and optimality in control is not well understood, and they are often seen as important yet conflicting objectives. There is a pressing need to formalize this relationship, especially given the growing prominence of learning-based methods. Indeed, it is common practice in reinforcement learning to simply modify reward functions by penalizing failures, with the penalty treated as a mere heuristic. We rigorously examine this relationship, and formalize the requirements for safe value functions: value functions that are both optimal for a given task, and enforce safety. We reveal the structure of this relationship through a proof of strong duality, showing that there always exists a finite penalty that induces a safe value function. This penalty is not unique, but upperunbounded: larger penalties do not harm optimality. Although it is often not possible to compute the minimum required penalty, we reveal clear structure of how the penalty, rewards, discount factor, and dynamics interact. This insight suggests practical, theory-guided heuristics to design reward functions for control problems where safety is important.

show abstract

DeepQ Stepper: A framework for reactive dynamic walking on uneven terrain

Cited by 2 publications

References 24 publications

GLiDE: Generalizable Quadrupedal Locomotion in Diverse Environments with a Centroidal Model

GLiDE: Generalizable Quadrupedal Locomotion in Diverse Environments with a Centroidal Model

Safe Value Functions

Contact Info

Product

Resources

About