2021
DOI: 10.48550/arxiv.2106.03911
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

XIRL: Cross-embodiment Inverse Reinforcement Learning

Abstract: We investigate the visual cross-embodiment imitation setting, in which agents learn policies from videos of other agents (such as humans) demonstrating the same task, but with stark differences in their embodiments -shape, actions, end-effector dynamics, etc. In this work, we demonstrate that it is possible to automatically discover and learn vision-based reward functions from cross-embodiment demonstration videos that are robust to these differences. Specifically, we present a self-supervised method for Cross… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 28 publications
(51 reference statements)
0
5
0
Order By: Relevance
“…Peng et al (2020) use an approach based on keypoint matching to learn robotic locomotion behaviors from demonstrations of walking dogs. Zakka et al (2021) learn a visual reward function that allows reinforcement learning agents to learn from demonstrators with different embodiments.…”
Section: Kinematic Retargeting and Visual Teleoperationmentioning
confidence: 99%
“…Peng et al (2020) use an approach based on keypoint matching to learn robotic locomotion behaviors from demonstrations of walking dogs. Zakka et al (2021) learn a visual reward function that allows reinforcement learning agents to learn from demonstrators with different embodiments.…”
Section: Kinematic Retargeting and Visual Teleoperationmentioning
confidence: 99%
“…Peng et al [29] use an approach based on keypoint matching to learn robotic locomotion behaviors from demonstrations of walking dogs. Zakka et al [40] learn a visual reward function that allows reinforcement learning agents to learn from demonstrators with different embodiments.…”
Section: Related Workmentioning
confidence: 99%
“…For example, TCN [39] learns a self-supervised temporal-consistent embedding for imitation learning and reinforcement learning. XIRL [40] learns a self-supervised embedding that estimates task progress for inverse reinforcement learning. [41], [42], [43] directly map the observations such as images to the target domain.…”
Section: A Learning From Noisy Demonstrationmentioning
confidence: 99%