Dario Pavllo scite author profile

In this work, we demonstrate that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D keypoints. We also introduce back-projection, a simple and effective semi-supervised training method that leverages unlabeled video data. We start with predicted 2D keypoints for unlabeled video, then estimate 3D poses and finally back-project to the input 2D keypoints. In the supervised setting, our fully-convolutional model outperforms the previous best result from the literature by 6 mm mean per-joint position error on Human3.6M, corresponding to an error reduction of 11%, and the model also shows significant improvements on HumanEva-I. Moreover, experiments with back-projection show that it comfortably outperforms previous state-of-the-art results in semisupervised settings where labeled data is scarce. Code and models are available at https://github.com/ facebookresearch/

show abstract

Modeling Human Motion with Quaternion-Based Neural Networks

Pavllo

et al. 2019

View full text Add to dashboard Cite

Previous work on predicting or generating 3D human pose sequences regresses either joint rotations or joint positions. The former strategy is prone to error accumulation along the kinematic chain, as well as discontinuities when using Euler angles or exponential maps as parameterizations. The latter requires re-projection onto skeleton constraints to avoid bone stretching and invalid configurations. This work addresses both limitations. QuaterNet represents rotations with quaternions and our loss function performs forward kinematics on a skeleton to penalize absolute position errors instead of angle errors. We investigate both recurrent and convolutional architectures and evaluate on short-term prediction and long-term generation. For the latter, our approach is qualitatively judged as realistic as recent neural strategies from the graphics literature. Our experiments compare quaternions to Euler angles as well as exponential maps and show that only a very short context is required to make reliable future predictions. Finally, we show that the standard evaluation protocol for Human3.6M produces high variance results and we propose a simple solution.

show abstract

3D human pose estimation in video with temporal convolutions and semi-supervised training

Pavllo¹,

Feichtenhofer²,

Grangier³

et al. 2018

Preprint

View full text Add to dashboard Cite

Hierarchical Image Classification using Entailment Cone Embeddings

Dhall

Makarova

Ganea

et al. 2020

View full text Add to dashboard Cite

Controlling Style and Semantics in Weakly-Supervised Image Generation

Pavllo

Lucchi

Hofmann

2020

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Dario Pavllo

3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training

Modeling Human Motion with Quaternion-Based Neural Networks

3D human pose estimation in video with temporal convolutions and semi-supervised training

Hierarchical Image Classification using Entailment Cone Embeddings

Controlling Style and Semantics in Weakly-Supervised Image Generation

Contact Info

Product

Resources

About