2020
DOI: 10.1609/aaai.v34i04.6008
|View full text |Cite
|
Sign up to set email alerts
|

Chained Representation Cycling: Learning to Estimate 3D Human Pose and Shape by Cycling Between Representations

Abstract: The goal of many computer vision systems is to transform image pixels into 3D representations. Recent popular models use neural networks to regress directly from pixels to 3D object parameters. Such an approach works well when supervision is available, but in problems like human pose and shape estimation, it is difficult to obtain natural images with 3D ground truth. To go one step further, we propose a new architecture that facilitates unsupervised, or lightly supervised, learning. The idea is to break the pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 25 publications
(16 citation statements)
references
References 28 publications
0
16
0
Order By: Relevance
“…Kolotouros et al [10] present a self-improving approach for training a neural network through the tight collaboration of a regression method and an optimization method (SMPLify [6]). Considering the difficulty of obtaining natural images with 3D ground truth, Rüegg et al [35] propose a new deep learning architecture that facilitates unsupervised or lightly supervised learning. SMPL-X [30] can express finger motions and facial expressions compared to SMPL [11].…”
Section: A 3d Human Recovery From a Single Imagementioning
confidence: 99%
“…Kolotouros et al [10] present a self-improving approach for training a neural network through the tight collaboration of a regression method and an optimization method (SMPLify [6]). Considering the difficulty of obtaining natural images with 3D ground truth, Rüegg et al [35] propose a new deep learning architecture that facilitates unsupervised or lightly supervised learning. SMPL-X [30] can express finger motions and facial expressions compared to SMPL [11].…”
Section: A 3d Human Recovery From a Single Imagementioning
confidence: 99%
“…Body reconstruction: For years, the community focused on the prediction of 2D or 3D landmarks for the body [19], face [17] and hands [83,99], with a recent shift towards estimating 3D model parameters [15,48,67,71] or 3D surfaces [54,79,80,96]. One line of work simplifies the problem by using proxy representations like 2D joints [15,34,35,42,64,71,82,93,109], silhouettes [8,42,71], part labels [67,78] or dense correspondences [75,105]. These are then "lifted" to 3D, either as part of an energy term that is minimized [15,42,104] or using a regressor [64,67,71,93].…”
Section: Related Workmentioning
confidence: 99%
“…To tackle this problem, Kanazawa et al [21] propose an adversarial framework, utilizing the unpaired 3D annotations, to facilitate the reconstruction. Several researches [49,53,33,40,44] also show that the paired 3D annotation is not necessary, attempting to find more representative temporal features [49,53] or employ more informative input such as RGB-D [33], and part segmentation [40,44] to facilitate human mesh reconstruction. However, there still exists a principled challenge in this task, where neither the unpaired 3D annotation nor the other mentioned intermediate representations could effectively fill the gap between two largely different datasets.…”
Section: Related Workmentioning
confidence: 99%