2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00762
|View full text |Cite
|
Sign up to set email alerts
|

DensePose: Dense Human Pose Estimation in the Wild

Abstract: Figure 1: Dense pose estimation aims at mapping all human pixels of an RGB image to the 3D surface of the human body. We introduce DensePose-COCO, a large-scale ground-truth dataset with image-to-surface correspondences manually annotated on 50K COCO images and train DensePose-RCNN, to densely regress part-specific UV coordinates within every human region at multiple frames per second. Left: The image and the regressed correspondence by DensePose-RCNN, Middle: DensePose COCO Dataset annotations, Right: Partiti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
476
0
5

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 1,022 publications
(522 citation statements)
references
References 48 publications
0
476
0
5
Order By: Relevance
“…More specifically, a voxel representation limits spatial resolution, and a template-based approach has difficulty handling varying topology and large deformations. Although the template-based approach [3] retains some distinctive shapes such as wrinkles, the resulting shapes lose the fidelity of the input subject due to the imperfect mapping from image space to the texture parameterization using the off-the-shelf human dense correspondences map [4]. In contrast, our method fully leverages the expressive shape representation for both base and refined shapes and directly predicts 3D geometry at a pixel-level, retaining all the details that are present in the input image.…”
Section: Comparisonsmentioning
confidence: 99%
“…More specifically, a voxel representation limits spatial resolution, and a template-based approach has difficulty handling varying topology and large deformations. Although the template-based approach [3] retains some distinctive shapes such as wrinkles, the resulting shapes lose the fidelity of the input subject due to the imperfect mapping from image space to the texture parameterization using the off-the-shelf human dense correspondences map [4]. In contrast, our method fully leverages the expressive shape representation for both base and refined shapes and directly predicts 3D geometry at a pixel-level, retaining all the details that are present in the input image.…”
Section: Comparisonsmentioning
confidence: 99%
“…However, because they require each character to be tracked by a bounding box, they only reconstruct single‐person skeletons at a time, making them unsuitable for closely interacting characters. More recently, an enormous effort has been devoted to deep and convolutional methods that map all human pixels of an RGB image to 3D surface of the human body …”
Section: Related Workmentioning
confidence: 99%
“…More recently, an enormous effort has been devoted to deep and convolutional methods that map all human pixels of an RGB image to 3D surface of the human body. [45][46][47][48] Our method overcomes most of the prior work limitations, including the 3D capturing of multiple characters, the skeletal model constraints, and the production of smooth animation for the articulated character.…”
Section: Related Workmentioning
confidence: 99%
“…The DensePose-COCO dataset [19] has reannotated dense body surface annotations on the 50k COCO images. These dense body surface annotations can be understood as continuous part labels of each human body.…”
Section: Related Workmentioning
confidence: 99%