2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00243
|View full text |Cite
|
Sign up to set email alerts
|

HEMlets Pose: Learning Part-Centric Heatmap Triplets for Accurate 3D Human Pose Estimation

Abstract: Estimating 3D human pose from a single image is a challenging task. This work attempts to address the uncertainty of lifting the detected 2D joints to the 3D space by introducing an intermediate state -Part-Centric Heatmap Triplets (HEMlets), which shortens the gap between the 2D observation and the 3D interpretation. The HEMlets utilize three joint-heatmaps to represent the relative depth information of the end-joints for each skeletal body part. In our approach, a Convolutional Network (ConvNet) is first tra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
44
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 101 publications
(50 citation statements)
references
References 44 publications
(161 reference statements)
0
44
0
Order By: Relevance
“…1) directly map an image to 3D pose, and 2) lift 2D pose to 3D pose. (Zhou et al 2019;Moon, Chang, and Lee 2019;Li et al 2020b,a) are based on the first approach. (Li and Chan 2014) employed a shallow network to regress 3D joint coordinates directly with synchronous task of body part detection with sliding windows.…”
Section: D Single-person Pose Estimationmentioning
confidence: 99%
See 2 more Smart Citations
“…1) directly map an image to 3D pose, and 2) lift 2D pose to 3D pose. (Zhou et al 2019;Moon, Chang, and Lee 2019;Li et al 2020b,a) are based on the first approach. (Li and Chan 2014) employed a shallow network to regress 3D joint coordinates directly with synchronous task of body part detection with sliding windows.…”
Section: D Single-person Pose Estimationmentioning
confidence: 99%
“…As can be seen, our method outperforms all competing single-frame methods and obtains average errors of 48.7mm and 31.8mm under two evaluation protocols. In addition, some approaches such as (Sun et al 2018;Zhou et al 2019) use the camera intrinsics and the ground-truth distance of the subject from the camera to convert their (u, v) predictions to (x, y). Therefore, their reported results are not representative of the 3D pose performance.…”
Section: Implementation Detailsmentioning
confidence: 99%
See 1 more Smart Citation
“…3D human pose and shape estimation is a fundamental yet challenging task in computer vision. There are plenty of approaches proposed to accurately capture 2D pose and even 3D joint locations [9,26,43,46,57,58]. Since sparse joints alone cannot provide enough information for analyzing humans [22], incremental recent works interest in recovering the 3D mesh of a human body, where the 3D surface is defined.…”
Section: Introductionmentioning
confidence: 99%
“…These make mocaps unsuitable for use in unconstrained home or clinical or sports settings. Recently, many deep learning methods, for example, References [ 14 , 15 , 16 , 17 , 18 ], have been proposed to extract 3D human pose from RGB images. Such methods (a) either do not deal with view-invariance and are trained from specific views on their respective datasets (for example, References [ 14 , 17 ] show that their methods fail when they apply them on poses and view angles which are different from their training sets), (b) or if they handle view-invariance, such as References [ 19 , 20 ], then they need multiple views for training.…”
Section: Introductionmentioning
confidence: 99%