Unsupervised Human Pose Estimation through Transforming Shape Templates

Schmidtke, Luca; Vlontzos, Athanasios; Ellershaw, Simon; Lukens, Anna; Kainz, Bernhard

doi:10.1109/cvpr46437.2021.00251

Cited by 27 publications

(24 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although 2D joints are cheaper to annotate, the process is still time-consuming and hard to scale to a large number of objects and classes. To address this, some recent methods aim to discover the joints of articulated objects using self-supervised learning [18,21,49]. While these methods show impressive results, they still rely on carefully designed, object-specific templates and/or prior information, which cannot be directly applied to other object classes.…”

Section: Discovery Of 3d Joints Of Articulated Objectsmentioning

confidence: 99%

“…Moreover existing methods are not scene-specific. The methods by Schmidtke et al [49] and by Kundu et al [21], both of which assume a template and only work for humans, are the closest existing solutions for unsupervised, direct pose manipulation. Although they tackle a more constrained task, we use them as inspiration for baselines that allow for a quantitative evaluation.…”

Section: Baselines Description and Evaluationmentioning

confidence: 99%

“…Both Schmidtke et al [49] and Kundu et al [21] employ a CNN-based encoder, which allows them to work on scenes not seen in training. This gives an unfair advantage to our method, which overfits to a specific sequence.…”

Section: Baselines Description and Evaluationmentioning

confidence: 99%

“…Schmidtke* et al [49] Their original model trains the deformation of a 2D template of a person's structure in an unsupervised manner using image reconstruction. They use a CNN-based encoder to estimate the 2D deformation parameters.…”

Section: A7 Pose Manipulation (Section 41)mentioning

confidence: 99%

“…Learning to predict the location of joints in 3D is a well-studied task, at least for humans, and it is generally tackled using 2D [16,17,20,45,56,61] or 3D [15,19,[23][24][25]51,69] ground truth information. When not using joints supervision, existing pose manipulation methods rely on a predefined model, that is, a template structure [21,49]. However, annotations are expensive and object-specific, which is why they are only available for limited classes of objects, such as people or faces [14,41,48].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Watch It Move: Unsupervised Discovery of 3D Joints for Re-Posing of Articulated Objects

Noguchi¹,

Iqbal²,

Tremblay³

et al. 2021

Preprint

View full text Add to dashboard Cite

Figure 1: Animated figure (view in Adobe Reader and click on panes (b), (c), and (d)). Our method learns to render novel views of an articulated, moving object by "watching" it move in a multi-view video sequence and associated foreground masks, as shown in the animation in (b). Simultaneously, it discovers the object's parts and joints with no additional supervision. The learned structure can be used to explicitly re-pose the object, by roto-translating each part around its joint. In panes (c) and (d) we re-pose objects from different categories to configurations never seen in training, an operation only possible thanks to the structure we discover from the input videos.

show abstract

Section: Discovery Of 3d Joints Of Articulated Objectsmentioning

confidence: 99%

Section: Baselines Description and Evaluationmentioning

confidence: 99%

Section: Baselines Description and Evaluationmentioning

confidence: 99%

Section: A7 Pose Manipulation (Section 41)mentioning

confidence: 99%