“…Learning to predict the location of joints in 3D is a well-studied task, at least for humans, and it is generally tackled using 2D [16,17,20,45,56,61] or 3D [15,19,[23][24][25]51,69] ground truth information. When not using joints supervision, existing pose manipulation methods rely on a predefined model, that is, a template structure [21,49]. However, annotations are expensive and object-specific, which is why they are only available for limited classes of objects, such as people or faces [14,41,48].…”