2020
DOI: 10.48550/arxiv.2011.02404
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Dynamics Randomization Revisited:A Case Study for Quadrupedal Locomotion

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
4

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 21 publications
0
6
0
Order By: Relevance
“…It is often realized via a comprehensive simulation of the robot, e.g., [9,19,20]. Rewards designed based on reference trajectories [6,7] or carefully tuned reward terms [8,10,21,22] are often necessary to regularize undesirable behaviors as to be feasible for a physical robot. The computation cost to train a policy often requires millions to billions of transition tuples of the full physics simulation.…”
Section: B Deep Reinforcement Learning For Quadrupedal Robotsmentioning
confidence: 99%
See 2 more Smart Citations
“…It is often realized via a comprehensive simulation of the robot, e.g., [9,19,20]. Rewards designed based on reference trajectories [6,7] or carefully tuned reward terms [8,10,21,22] are often necessary to regularize undesirable behaviors as to be feasible for a physical robot. The computation cost to train a policy often requires millions to billions of transition tuples of the full physics simulation.…”
Section: B Deep Reinforcement Learning For Quadrupedal Robotsmentioning
confidence: 99%
“…The learned policy works directly for the inverted-configuration robot. We also train a trotting policy for the default Laikago, utilizing the full physics simulation and directly generate commands at the joint control level, similar to [6,7]. This end-to-end trained policy works well with the default configuration of Laikago, as expected, but fails to generalize to the inverted configuration due to the learned control policy being very specific to the morphology it was trained on.…”
Section: A Trotting and Walking On Flat Terrainmentioning
confidence: 99%
See 1 more Smart Citation
“…To overcome the sim-to-real gap, previous work also utilizes domain randomization to train a robust policy to adapt to a wide range of dynamic parameter settings [18]. However, recent work argues that it is possible to transfer the simulation controller directly with domain adaptation, by calibrating the dynamic parameters in simulation [28]. In this work, we use domain adaptation to narrow the sim-toreal gap instead.…”
Section: B Domain Adaptationmentioning
confidence: 99%
“…Legged Locomotion: This has conventionally been accomplished using control theory [2,5,6,22,28,31,33,39,55,63,72,88] over handcrafted dynamics models. Recently, RL has been successfully used to learn such policies in simulation [21,49,56,68] and in the real world with sim2real methods [25,29,59,61,75,75,77,85]. Alternatively, a policy learnt in simulation can be adapted at test-time to work well in real environments [15,19,45,62,70,71,[89][90][91][92]95].…”
Section: Related Workmentioning
confidence: 99%