2020
DOI: 10.48550/arxiv.2010.14834
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

DeepQ Stepper: A framework for reactive dynamic walking on uneven terrain

Abstract: Reactive stepping and push recovery for biped robots is often restricted to flat terrains because of the difficulty in computing capture regions for nonlinear dynamic models. In this paper, we address this limitation by using reinforcement learning to approximately learn the 3D capture region for such systems. We propose a novel 3D reactive stepper, The DeepQ stepper, that computes optimal step locations for walking at different velocities using the 3D capture regions approximated by the action-value function.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 24 publications
0
3
0
Order By: Relevance
“…This limits our ability to deal with more challenging terrain, e.g., more challenging variations of the stepping stone task. Incorporating systems [36,44] that directly learn foot placement will be crucial for further improving the capability and robustness of our system.…”
Section: Discussion and Future Workmentioning
confidence: 99%
See 1 more Smart Citation
“…This limits our ability to deal with more challenging terrain, e.g., more challenging variations of the stepping stone task. Incorporating systems [36,44] that directly learn foot placement will be crucial for further improving the capability and robustness of our system.…”
Section: Discussion and Future Workmentioning
confidence: 99%
“…Heuristics or dynamics can be learned to enable faster optimization or planning, e.g., [18,33]. Hierarchical control structures have been proposed to allow model-based control and model-free RL policies to operate at different time scales to leverage their respective advantages, e.g., [4,34,35,36].…”
Section: Combination Of Model-based Control and Learningmentioning
confidence: 99%
“…The penalty is then seen as a design choice controlling the risk of optimal controllers [21], or as a parameter that must be optimized over to allow for the least conservative safe behaviour [22], [23]. In practice, the penalty is often treated as a heuristic [24]- [27]. From a theoretical point of view, it is unclear whether optimal controllers of penalized problems enjoy safety guarantees, and whether they are optimal with respect to the original constrained task [2], [28].…”
Section: A Related Workmentioning
confidence: 99%