Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
2022
DOI: 10.48550/arxiv.2203.17138
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Imitate and Repurpose: Learning Reusable Robot Movement Skills From Human and Animal Behaviors

Abstract: We investigate the use of prior knowledge of human and animal movement to learn reusable locomotion skills for real legged robots. Our approach builds upon previous work on imitating human or dog Motion Capture (MoCap) data to learn a movement skill module. Once learned, this skill module can be reused for complex downstream tasks. Importantly, due to the prior imposed by the MoCap data, our approach does not require extensive reward engineering to produce sensible and natural looking behavior at the time of r… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
19
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 8 publications
(19 citation statements)
references
References 50 publications
0
19
0
Order By: Relevance
“…However, without a tedious reward-engineering process dedicated to the task at hand, it generally struggles to discover the right behaviors. This can be addressed by leveraging expert demonstrations: Multimodal trajectories can be encoded into motion priors that are then used to guide the training of a robust RL agent (39,81,82) in complex loco-manipulation settings. However, instead of relying on expert trajectories generated by human examples (83,84), one could potentially use our framework as an automatic provider of physically consistent demonstrations for an RL pipeline.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…However, without a tedious reward-engineering process dedicated to the task at hand, it generally struggles to discover the right behaviors. This can be addressed by leveraging expert demonstrations: Multimodal trajectories can be encoded into motion priors that are then used to guide the training of a robust RL agent (39,81,82) in complex loco-manipulation settings. However, instead of relying on expert trajectories generated by human examples (83,84), one could potentially use our framework as an automatic provider of physically consistent demonstrations for an RL pipeline.…”
Section: Discussionmentioning
confidence: 99%
“…One common way of overcoming this issue has been by bootstrapping RL with a form of imitation learning. For example, training can be guided by recorded human demonstrations (28,29), animal motion capture clips (38,39), or an RL-trained teacher with access to privileged information (27,40,41). Nevertheless, generating expert demonstrations for every newly encountered task is time consuming, and motion retargeting is often challenging, especially when the robot's morphology differs from that of the demonstrator.…”
Section: Introductionmentioning
confidence: 99%
“…[13] revisited the necessity of dynamics randomization in legged locomotion and gave suggestions on where and how to use dynamics randomization. [14] trained a low-level quadruped robot locomotion controller by imitating real animal data and used the low-level controller to accomplish different tasks. [15] trained policies to jump from pixel inputs.…”
Section: Deep Reinforcement Learning For Legged Locomotionmentioning
confidence: 99%
“…A notable line of research has focused on learning a specific task while imitating expert behavior. The expert provides a direct demonstration for solving the task (22,23) or is used to impose a style while discovering the task (24)(25)(26). These approaches require collecting expert data, commonly done offline, through either retargeted motion capture data (24)(25)(26) or a TO technique (22,23).…”
Section: Introductionmentioning
confidence: 99%
“…The expert provides a direct demonstration for solving the task (22,23) or is used to impose a style while discovering the task (24)(25)(26). These approaches require collecting expert data, commonly done offline, through either retargeted motion capture data (24)(25)(26) or a TO technique (22,23). The reward function can then be formulated to be dense, meaning that agents can collect nontrivial rewards even if they do not initially solve the task.…”
Section: Introductionmentioning
confidence: 99%