2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) 2020
DOI: 10.1109/fg47880.2020.00048
|View full text |Cite
|
Sign up to set email alerts
|

Head2Head: Video-based Neural Head Synthesis

Abstract: In this paper, we propose a novel machine learning architecture for facial reenactment. In particular, contrary to the model-based approaches or recent frame-based methods that use Deep Convolutional Neural Networks (DCNNs) to generate individual frames, we propose a novel method that (a) exploits the special structure of facial motion (paying particular attention to mouth motion) and (b) enforces temporal consistency. We demonstrate that the proposed method can transfer facial expressions, pose and gaze of a … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
24
0
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 52 publications
(25 citation statements)
references
References 44 publications
(81 reference statements)
0
24
0
1
Order By: Relevance
“…Video-based 3D reconstruction. The video fitting approach followed in Head2Head [15] to estimate the 3D facial geometry is based on a set of sparse landmarks extracted from the entire input sequence. This method has three main drawbacks: 1) the fidelity of the 3D reconstruction relies heavily on the accuracy of extracted landmarks which are also sparse (68 in total), 2) it might require a large number of frames with enough reconstruction cues (various rotations) to produce good accuracy, 3) it makes a quite strong assumption in the initialisation stage about the rigidity of the face to estimate the camera parameters.…”
Section: D Facial Recoverymentioning
confidence: 99%
See 4 more Smart Citations
“…Video-based 3D reconstruction. The video fitting approach followed in Head2Head [15] to estimate the 3D facial geometry is based on a set of sparse landmarks extracted from the entire input sequence. This method has three main drawbacks: 1) the fidelity of the 3D reconstruction relies heavily on the accuracy of extracted landmarks which are also sparse (68 in total), 2) it might require a large number of frames with enough reconstruction cues (various rotations) to produce good accuracy, 3) it makes a quite strong assumption in the initialisation stage about the rigidity of the face to estimate the camera parameters.…”
Section: D Facial Recoverymentioning
confidence: 99%
“…We use the dense 3D vertices (∼5K) estimated by our trained DenseFaceReg on each video frame to: 1) estimate the camera parameters, 2) generate the 3DMM identity and expression coefficients by projecting the dense shape onto the 3DMM bases. For all our experiments in this work, we use the same 3DMMs utilised in [15]. The analysis-by-synthesis approach, which is used by many state-of-the-art approaches [3], [4], estimates a lot of parameters (e.g.…”
Section: D Facial Recoverymentioning
confidence: 99%
See 3 more Smart Citations