2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01081
|View full text |Cite
|
Sign up to set email alerts
|

Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
31
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 79 publications
(46 citation statements)
references
References 33 publications
0
31
0
Order By: Relevance
“…Finally, we refine the hand and object poses through graph convolutional blocks equipped with the proposed mutual attention layer. We show that our method does not require iterative optimization as in [48,13], and the dense vertex-level mutual attention can model the hand-object interaction more effectively than sparse keypoints based methods [11,8]. In summary, our contributions are as follows.…”
Section: Introductionmentioning
confidence: 95%
See 3 more Smart Citations
“…Finally, we refine the hand and object poses through graph convolutional blocks equipped with the proposed mutual attention layer. We show that our method does not require iterative optimization as in [48,13], and the dense vertex-level mutual attention can model the hand-object interaction more effectively than sparse keypoints based methods [11,8]. In summary, our contributions are as follows.…”
Section: Introductionmentioning
confidence: 95%
“…In [41] a self-attention mechanism is used to capture feature dependencies for either the hand or the object and the interaction between them is modeled by the exchange of global features. Most close to our work is [11] where a cross-attention is used to model the correlation between the hand and the object. However, all above methods only model a sparse interaction between a pre-defined set of keypoints or features from the hand and the object, regardless of the fact that hand-object interaction actually occurs on physical regions of the surfaces.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…More recent works leverage the increasing capacity of computer vision to collect human hand poses when interacting with the object. HO3D [22,55] computes the ground truth 3D hand pose for images from 2D hand keypoint annotations. The method resolves ambiguities by considering physics constraints in hand-object interactions and hand-hand interactions.…”
Section: Dexterous Grasp Datasetsmentioning
confidence: 99%