DexMV: Imitation Learning for Dexterous Manipulation from Human Videos

Qin, Yuzhe; Wu, Yueh-Hua; Liu, Shaowei; Jiang, Hanwen; Yang, Ruihan; Fu, Yang; Wang, Xiaolong

doi:10.48550/arxiv.2108.05877

Cited by 8 publications

(16 citation statements)

References 55 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For articulated objects, we additionally provide part annotations similar to PartNet [34]. Worth mentioning, by providing object meshes, HOI4D could facilitate research in instance-level HOI and also makes it possible to transfer the human interaction trajectories to a simulation environment for applications such as robot imitation learning [38]. Label propagation.…”

Section: Category-level Pose Annotationmentioning

confidence: 99%

“…Inspired by DexMV [38], we divide the demonstration collection process into three steps named hand joint retargeting, state-only demonstration collection and state-action demonstration collection. We transform the human hand pose represented as 51 DoF MANO model [39] to 30 DoF Adroit Hand pose in the hand joint retargeting step.…”

Section: D2 Demonstration Collectionmentioning

confidence: 99%

“…Hand joint retargeting. Our hand joint retargeting method is inspired by DexMV [38]. Given the original 3D positions of human hand keypoints represented by MANO model [39], our goal is to find the optimum robot Adroit Hand pose that minimizes the distances between corresponding keypoints from human hand and Adroit Hand.…”

Section: D2 Demonstration Collectionmentioning

confidence: 99%

“…Given the original 3D positions of human hand keypoints represented by MANO model [39], our goal is to find the optimum robot Adroit Hand pose that minimizes the distances between corresponding keypoints from human hand and Adroit Hand. We use the task space vectors [21,38] and accordingly define the objective function [38] of the optimization problem. The 15 task space vectors used in our method are designed based on DexPilot [21]: vectors between five fingertips and vectors from wrist to fingertips for both human hand and Adroit Hand.…”

Section: D2 Demonstration Collectionmentioning

confidence: 99%

See 3 more Smart Citations

HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction

Liu¹,

Liu²,

Che³

et al. 2022

Preprint

View full text Add to dashboard Cite

We present HOI4D, a large-scale 4D egocentric dataset with rich annotations, to catalyze the research of categorylevel human-object interaction. HOI4D consists of 2.4M RGB-D egocentric video frames over 4000 sequences collected by 9 participants interacting with 800 different object instances from 16 categories over 610 different indoor rooms. Frame-wise annotations for panoptic segmentation, motion segmentation, 3D hand pose, category-level object pose and hand action have also been provided, together with reconstructed object meshes and scene point clouds. With HOI4D, we establish three benchmarking tasks to promote category-level HOI from 4D visual signals including semantic segmentation of 4D dynamic point cloud sequences, category-level object pose tracking, and egocentric action segmentation with diverse interaction targets. In-depth analysis shows HOI4D poses great challenges to existing methods and produces huge research opportunities.

show abstract

Section: Category-level Pose Annotationmentioning

confidence: 99%

Section: D2 Demonstration Collectionmentioning

confidence: 99%

Section: D2 Demonstration Collectionmentioning

confidence: 99%

Section: D2 Demonstration Collectionmentioning

confidence: 99%

See 2 more Smart Citations

HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction

Liu¹,

Liu²,

Che³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…It was not until recently that data-driven approaches have begun to promote research on learning human manipulation [2,15,20,36,44,61]. Prior work has tried to empower a machine complex skills such as handobject localization [46], pose estimation [30], grasp generation [11], and action imitation [42].…”

Section: Introductionmentioning

confidence: 99%

OakInk: A Large-scale Knowledge Repository for Understanding Hand-Object Interaction

Yang¹,

Li²,

Zhan³

et al. 2022

Preprint

View full text Add to dashboard Cite

Learning how humans manipulate objects requires machines to acquire knowledge from two perspectives: one for understanding object affordances and the other for learning human's interactions based on the affordances. Even though these two knowledge bases are crucial, we find that current databases lack a comprehensive awareness of them. In this work, we propose a multi-modal and rich-annotated knowledge repository, OakInk, for visual and cognitive understanding of hand-object interactions. We start to collect 1,800 common household objects and annotate their affordances to construct the first knowledge base: Oak. Given the affordance, we record rich human interactions with 100 selected objects in Oak. Finally, we transfer the interactions on the 100 recorded objects to their virtual counterparts through a novel method: Tink. The recorded and transferred hand-object interactions constitute the second knowledge base: Ink. As a result, OakInk contains 50,000 distinct affordance-aware and intent-oriented hand-object interactions. We benchmark OakInk on pose estimation and grasp generation tasks. Moreover, we propose two practical applications of OakInk: intent-based interaction generation and handover generation. Our datasets and source code are publicly available at https://github.com/ lixiny/OakInk.

show abstract

D-Grasp: Physically Plausible Dynamic Grasp Synthesis for Hand-Object Interactions

Christen

Kocabas

Aksan

et al. 2022

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

We introduce the dynamic grasp synthesis task: given an object with a known 6D pose and a grasp reference, our goal is to generate motions that move the object to a target 6D pose. This is challenging, because it requires reasoning about the complex articulation of the human hand and the intricate physical interaction with the object. We propose a novel method that frames this problem in the reinforcement learning framework and leverages a physics simulation, both to learn and to evaluate such dynamic interactions. A hierarchical approach decomposes the task into low-level grasping and high-level motion synthesis. It can be used to generate novel hand sequences that approach, grasp, and move an object to a desired location, while retaining human-likeness. We show that our approach leads to stable grasps and generates a wide range of motions. Furthermore, even imperfect labels can be corrected by our method to generate dynamic interaction sequences. Video and code are available at: https://eth-ait.github.io/d-grasp/.

show abstract

DexMV: Imitation Learning for Dexterous Manipulation from Human Videos

Cited by 8 publications

References 55 publications

HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction

HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction

OakInk: A Large-scale Knowledge Repository for Understanding Hand-Object Interaction

D-Grasp: Physically Plausible Dynamic Grasp Synthesis for Hand-Object Interactions

Contact Info

Product

Resources

About