2018
DOI: 10.1007/978-3-030-01231-1_41
|View full text |Cite
|
Sign up to set email alerts
|

Weakly-Supervised 3D Hand Pose Estimation from Monocular RGB Images

Abstract: Compared with depth-based 3D hand pose estimation, it is more challenging to infer 3D hand pose from monocular RGB images, due to substantial depth ambiguity and the difficulty of obtaining fullyannotated training data. Different from existing learning-based monocular RGB-input approaches that require accurate 3D annotations for training, we propose to leverage the depth images that can be easily obtained from commodity RGB-D cameras during training, while during testing we take only RGB inputs for 3D joint pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

5
328
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 257 publications
(345 citation statements)
references
References 41 publications
(90 reference statements)
5
328
0
Order By: Relevance
“…Note that the StereoHands benchmark is close to saturation. In contrast to other methods [4,20,37,65,80] that only predicts sparse skeleton keypoints, our model produces a dense hand mesh. Figure A.1 presents some qualitative results from this dataset.…”
Section: A2 Mano Pose Representationmentioning
confidence: 97%
“…Note that the StereoHands benchmark is close to saturation. In contrast to other methods [4,20,37,65,80] that only predicts sparse skeleton keypoints, our model produces a dense hand mesh. Figure A.1 presents some qualitative results from this dataset.…”
Section: A2 Mano Pose Representationmentioning
confidence: 97%
“…The earlier work [68] attempts to learn the direct mapping from RGB images to 3D skeletons. Recent methods [4,14] have shown the state-of-the-art accuracy by implicitly reconstructing depth images i.e. 2.5D representations, and estimating the 3D skeletal based on them.…”
Section: Related Workmentioning
confidence: 99%
“…However, even under this setting, the problem still remains challenging as estimating a 3D mesh given an RGB image is a seriously ill-posed problem. Adopting recent human body pose estimation approaches [35,57], we further stratify learning of f DHPE : X → Y by decomposing f HME : X → V into a 2D evidence estimator f E2D : X → Z and a 3D mesh estimator f E3D : Z → V. Our 2D evidence z ∈ Z consists of a 42-dimensional 2D skeletal joint position vector j 2D (21 positions × 2; as in [4,14]) and a 2,048dimensional 2D feature vector F (x) (Eq. 2).…”
Section: Proposed Dense Hand Pose Estimatormentioning
confidence: 99%
See 2 more Smart Citations