Proceedings of the ACM SIGGRAPH / Eurographics Symposium on Computer Animation 2017
DOI: 10.1145/3099564.3099581
|View full text |Cite
|
Sign up to set email alerts
|

Production-level facial performance capture using deep convolutional neural networks

Abstract: We present a real-time deep learning framework for video-based facial performance capture-the dense 3D tracking of an actor's face given a monocular video. Our pipeline begins with accurately capturing a subject using a high-end production facial capture pipeline based on multi-view stereo tracking and artist-enhanced animations. With 5-10 minutes of captured footage, we train a convolutional neural network to produce high-quality output, including self-occluded regions, from a monocular video sequence of that… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
62
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 95 publications
(62 citation statements)
references
References 55 publications
0
62
0
Order By: Relevance
“…However, these methods require dense correspondence of facial points [38] or user-specific adaptations [31,7] to estimate the blendshape weights. Recent CNN based approaches either require depth input [30,20] or regress character-specific parameters with several constraints [1]. Commercial software products like Faceshift [15], Faceware [16] etc.…”
Section: Performance-based Animationmentioning
confidence: 99%
“…However, these methods require dense correspondence of facial points [38] or user-specific adaptations [31,7] to estimate the blendshape weights. Recent CNN based approaches either require depth input [30,20] or regress character-specific parameters with several constraints [1]. Commercial software products like Faceshift [15], Faceware [16] etc.…”
Section: Performance-based Animationmentioning
confidence: 99%
“…A similar recent line of works have explored combining a CNN-based encoder with a generative model as decoder for the problem of 3D face reconstruction from 2D photos and videos [16,24]. Unlike our method, these works use linear models to represent 3D faces, which captures limited expression variation w.r.t.…”
Section: Related Workmentioning
confidence: 99%
“…A notable exception is Laine et al [16], in which the linear 3DMM is initialized with principal component analysis, and refined during fine-tuning of the network. The model trained by Laine et al is person-specific and does not generalize to new subjects.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Laine et al . [LKA*17] leveraged deep learning to learn a mapping from an actor's image to the corresponding high‐quality performance captured mesh, allowing for the convenient capture of additional high‐quality data. Thanks to their machine learning formulation, these methods can infer coherent data during lips contacts if such information was present in the training set.…”
Section: Related Workmentioning
confidence: 99%