2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00017
|View full text |Cite
|
Sign up to set email alerts
|

Cross-Modal Deep Variational Hand Pose Estimation

Abstract: The human hand moves in complex and highdimensional ways, making estimation of 3D hand pose configurations from images alone a challenging task. In this work we propose a method to learn a statistical hand model represented by a cross-modal trained latent space via a generative deep neural network. We derive an objective function from the variational lower bound of the VAE framework and jointly optimize the resulting cross-modal KLdivergence and the posterior reconstruction objective, naturally admitting a tra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
260
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 287 publications
(272 citation statements)
references
References 38 publications
3
260
0
Order By: Relevance
“…The overall hand pose estimation accuracy is measured in the area under the curve (AUC) and the ratio of correct keypoints (PCK) with varying thresholds for each [68,4,14]. For comparison, we adopt seven hand pose estimation algorithms including five neural networks (CNNs)-based algorithms ( [4,68] for RHD, [14,29] for DO, and [29,68,46] for SHD) and two 3D model fitting-based algorithms [34,19].…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The overall hand pose estimation accuracy is measured in the area under the curve (AUC) and the ratio of correct keypoints (PCK) with varying thresholds for each [68,4,14]. For comparison, we adopt seven hand pose estimation algorithms including five neural networks (CNNs)-based algorithms ( [4,68] for RHD, [14,29] for DO, and [29,68,46] for SHD) and two 3D model fitting-based algorithms [34,19].…”
Section: Methodsmentioning
confidence: 99%
“…1) is important as it helps understand e.g. human-object interactions [7,6,3,1] and perform robotic Discriminative methods based on convolutional neural networks (CNNs) have shown very promising performance in estimating 3D hand poses either from RGB images [43,68,4,14,29,46] or depth maps [65,30,50,58,30,64,28,38,64,2]. However, the predictions are based on coarse skeletal representations, and no explicit kinematics and geometric mesh constraints are often considered.…”
Section: Introductionmentioning
confidence: 99%
“…Because, commonly, different datasets share similar feature distribution, especially when their data is sampled from close domains. To leverage such cross-domain shared knowledge, domain adaptation [49,15] has been widely studied on different tasks, such as detection [7,26], classification [19,21,17,16], segmentation [59,54,16] and pose estimation [57,46]. But in previous works about keypoint detection or pose estimation [9,57,53,56], source domain and target domain face much slighter domain shift than when transferring from human dataset to animals or among different animal species.…”
Section: Related Workmentioning
confidence: 99%
“…PCK In cases where large errors occur, the value of L pos can be misleading. Hence, following the 3D (hand) pose estimation literature [13,22,28,34], we introduce PCK by computing the percentage of predicted joints lying within a spherical threshold ρ around the target joint position, i.e.…”
Section: Metricsmentioning
confidence: 99%