Production-level facial performance capture using deep convolutional neural networks

Laine, Samuli; Karras, Tero; Aila, Timo; Herva, Antti; Saito, Shunsuke; Yu, Ronald; Li, Hao; Lehtinen, Jaakko

doi:10.1145/3099564.3099581

Cited by 95 publications

(62 citation statements)

References 55 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, these methods require dense correspondence of facial points [38] or user-specific adaptations [31,7] to estimate the blendshape weights. Recent CNN based approaches either require depth input [30,20] or regress character-specific parameters with several constraints [1]. Commercial software products like Faceshift [15], Faceware [16] etc.…”

Section: Performance-based Animationmentioning

confidence: 99%

Joint Face Detection and Facial Motion Retargeting for Multiple Faces

Chaudhuri

Vesdapunt

Wang

2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

Facial motion retargeting is an important problem in both computer graphics and vision, which involves capturing the performance of a human face and transferring it to another 3D character. Learning 3D morphable model (3DMM) parameters from 2D face images using convolutional neural networks is common in 2D face alignment, 3D face reconstruction etc. However, existing methods either require an additional face detection step before retargeting or use a cascade of separate networks to perform detection followed by retargeting in a sequence. In this paper, we present a single end-to-end network to jointly predict the bounding box locations and 3DMM parameters for multiple faces. First, we design a novel multitask learning framework that learns a disentangled representation of 3DMM parameters for a single face. Then, we leverage the trained single face model to generate ground truth 3DMM parameters for multiple faces to train another network that performs joint face detection and motion retargeting for images with multiple faces. Experimental results show that our joint detection and retargeting network has high face detection accuracy and is robust to extreme expressions and poses while being faster than state-of-the-art methods.

show abstract

Section: Performance-based Animationmentioning

confidence: 99%

Joint Face Detection and Facial Motion Retargeting for Multiple Faces

Chaudhuri

Vesdapunt

Wang

2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

show abstract

“…A similar recent line of works have explored combining a CNN-based encoder with a generative model as decoder for the problem of 3D face reconstruction from 2D photos and videos [16,24]. Unlike our method, these works use linear models to represent 3D faces, which captures limited expression variation w.r.t.…”

Section: Related Workmentioning

confidence: 99%

“…A notable exception is Laine et al [16], in which the linear 3DMM is initialized with principal component analysis, and refined during fine-tuning of the network. The model trained by Laine et al is person-specific and does not generalize to new subjects.…”

Section: Related Workmentioning

confidence: 99%

“…In particular, two of them [16,24] have successfully explored the combination of a CNN-based encoder with a linear generative model as decoder for the 3D reconstruction of faces from 2D photos and videos. We follow a similar strategy however, unlike Tewari et al [24] our decoder is learned with the rest of the network, and unlike Laine et al [16], our learned model generalizes to various factors captured for different subjects.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Multilinear Autoencoder for 3D Face Model Learning

Abrevaya

Wuhrer

Boyer

2018

2018 IEEE Winter Conference on Applications of Computer Vision (WACV)

View full text Add to dashboard Cite

Generative models have proved to be useful tools to represent 3D human faces and their statistical variations. With the increase of 3D scan databases available for training, a growing challenge lies in the ability to learn generative face models that effectively encode shape variations with respect to desired attributes, such as identity and expression, given datasets that can be diverse. This paper addresses this challenge by proposing a framework that learns a generative 3D face model using an autoencoder architecture, allowing hence for weakly supervised training. The main contribution is to combine a convolutional neural network-based encoder with a multilinear model-based decoder, taking therefore advantage of both the convolutional network robustness to corrupted and incomplete data, and of the multilinear model capacity to effectively model and decouple shape variations. Given a set of 3D face scans with annotation labels for the desired attributes, e.g. identities and expressions, our method learns an expressive multilinear model that decouples shape changes due to the different factors. Experimental results demonstrate that the proposed method outperforms recent approaches when learning multilinear face models from incomplete training data, particularly in terms of space decoupling, and that it is capable of learning from an order of magnitude more data than previous methods.

show abstract

“…Laine et al . [LKA*17] leveraged deep learning to learn a mapping from an actor's image to the corresponding high‐quality performance captured mesh, allowing for the convenient capture of additional high‐quality data. Thanks to their machine learning formulation, these methods can infer coherent data during lips contacts if such information was present in the training set.…”

Section: Related Workmentioning

confidence: 99%

Realtime Performance‐Driven Physical Simulation for Facial Animation

Barrielle

Stoiber²

2018

Computer Graphics Forum

View full text Add to dashboard Cite

We present the first realtime method for generating facial animations enhanced by physical simulation from realtime performance capture data. Unlike purely data‐based techniques, our method is able to produce physical effects on the fly through the simulation of volumetric skin behaviour, lip contacts and sticky lips. It remains however practical as it does not require any physical/medical data which are complex to acquire and process, and instead relies only on the input of a blendshapes model. We achieve realtime performance on the CPU by introducing an efficient progressive Projective Dynamics solver to efficiently solve the physical integration steps even when confronted to constantly changing constraints. Also key to our realtime performance is a new Taylor approximation and memoization scheme for the computation of the Singular Value Decompositions required for the simulation of volumetric skin. We demonstrate the applicability of our method by animating blendshape characters from a simple webcam feed .

show abstract

Production-level facial performance capture using deep convolutional neural networks

Cited by 95 publications

References 55 publications

Joint Face Detection and Facial Motion Retargeting for Multiple Faces

Joint Face Detection and Facial Motion Retargeting for Multiple Faces

Multilinear Autoencoder for 3D Face Model Learning

Realtime Performance‐Driven Physical Simulation for Facial Animation

Contact Info

Product

Resources

About