2020
DOI: 10.1007/978-3-030-58539-6_8
|View full text |Cite
|
Sign up to set email alerts
|

JGR-P2O: Joint Graph Reasoning Based Pixel-to-Offset Prediction Network for 3D Hand Pose Estimation from a Single Depth Image

Abstract: State-of-the-art single depth image-based 3D hand pose estimation methods are based on dense predictions, including voxel-to-voxel predictions, point-to-point regression, and pixel-wise estimations. Despite the good performance, those methods have a few issues in nature, such as the poor trade-off between accuracy and efficiency, and plain feature representation learning with local convolutions. In this paper, a novel pixel-wise prediction-based method is proposed to address the above issues. The key ideas are… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 36 publications
(14 citation statements)
references
References 39 publications
0
12
0
Order By: Relevance
“…However, regressing coordinates from images or point clouds is a highly non-linear problem, which can be hard to learn. Thus detectionbased [11,14,17,26,40] methods work in a dense local prediction manner via setting a heat-map for each keypoint. Moon et al [26] propose a Voxel-to-Voxel prediction network (V2V) for both 3D hand and human pose estimation.…”
Section: Related Work 21 Hand Pose Estimationmentioning
confidence: 99%
“…However, regressing coordinates from images or point clouds is a highly non-linear problem, which can be hard to learn. Thus detectionbased [11,14,17,26,40] methods work in a dense local prediction manner via setting a heat-map for each keypoint. Moon et al [26] propose a Voxel-to-Voxel prediction network (V2V) for both 3D hand and human pose estimation.…”
Section: Related Work 21 Hand Pose Estimationmentioning
confidence: 99%
“…Based on the input representations (depth maps, volume, point sets, etc. ), these approaches apply Conv2D [20], [21], Conv3D [22], [17], MLPs [23], [24], Transformers [25], etc. to automatically extract spatial features.…”
Section: B Depth Based Pose Estimationmentioning
confidence: 99%
“…We compare HandFoldingNet with other state-of-the-art methods, including methods with 2D (depth image) input: model-based method (DeepModel) [46], DeepPrior [27], improved DeepPrior (DeepPrior++) [26], region ensemble network , Ren-9x6x6 [42]), Pose-Ren [3], dense regression network (DenseReg) [42], CrossInfoNet [6] and JGR-P2O [8], and methods with 3D (point cloud or voxel) input: 3DCNN [11], SHPR-Net [4], HandPointNet [9], Point-to-Point [12] and V2V [24]. Figure 6 shows the success rate on the ICVL, NYU, and MSRA dataset.…”
Section: Comparison With State-of-the-artsmentioning
confidence: 99%