Joint Hand-Object Pose Estimation with Differentiably-Learned Physical Contact Point Analysis

Zhuang, Nan; Mu, Yadong

doi:10.1145/3460426.3463648

Cited by 3 publications

(3 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similarly, [7] proposes a lighter version of [61] by encoding the 3D shapes into graphs using node embeddings [15]. However, these multi-modal methods are limited as 3D shapes are oftentimes unavailable at testing [43,57,58]; The other category is the imagebased methods [13,37,59,61,63,66,68], that is only images are exploited for pose estimation. [13,37] regard corners of the 3D bounding box as generic keypoints, which only focus on cubic objects with simple geometric shape.…”

Section: Related Workmentioning

confidence: 99%

“…However, 3D shape is usually unavailable and acquiring 3D shapes is time consuming and labor intensive on-site. Therefore, the pure image-based object pose estimation without any 3D shape information has emerged [13,37,59,63,66,68]. In these works, one common approach [13,63,66,68] is to leverage keypoint features for pose estimation after detecting and re-projection of 2D keypoints, which requires for a suitable design of category-agnostic keypoints on various object geometries.…”

Section: Introductionmentioning

confidence: 99%

“…Therefore, the pure image-based object pose estimation without any 3D shape information has emerged [13,37,59,63,66,68]. In these works, one common approach [13,63,66,68] is to leverage keypoint features for pose estimation after detecting and re-projection of 2D keypoints, which requires for a suitable design of category-agnostic keypoints on various object geometries. Another approach [59] applies contrastive learning [2] to exploiting RGB-based geometric similarities shared between various categories, which heavily depends on the feature representation.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

3D-Augmented Contrastive Knowledge Distillation for Image-based Object Pose Estimation

Liu

Xing

Zhou

et al. 2022

Proceedings of the 2022 International Conference on Multimedia Retrieval

View full text Add to dashboard Cite

Image-based object pose estimation sounds amazing because in real applications the shape of object is oftentimes not available or not easy to take like photos. Although it is an advantage to some extent, un-explored shape information in 3D vision learning problem looks like "flaws in jade". In this paper, we deal with the problem in a reasonable new setting, namely 3D shape is exploited in the training process, and the testing is still purely image-based. We enhance the performance of image-based methods for categoryagnostic object pose estimation by exploiting 3D knowledge learned by a multi-modal method. Specifically, we propose a novel contrastive knowledge distillation framework that effectively transfers 3D-augmented image representation from a multi-modal model to an image-based model. We integrate contrastive learning into the two-stage training procedure of knowledge distillation, which formulates an advanced solution to combine these two approaches for cross-modal tasks. We experimentally report state-of-the-art results compared with existing category-agnostic image-based methods by a large margin (up to +5% improvement on ObjectNet3D dataset), demonstrating the effectiveness of our method. CCS CONCEPTS• Computing methodologies → Computer vision; Neural networks; Object recognition.

show abstract