2018
DOI: 10.1007/978-3-030-01261-8_41
|View full text |Cite
|
Sign up to set email alerts
|

X2Face: A Network for Controlling Face Generation Using Images, Audio, and Pose Codes

Abstract: The objective of this paper is a neural network model that controls the pose and expression of a given face, using another face or modality (e.g. audio). This model can then be used for lightweight, sophisticated video and image editing. We make the following three contributions. First, we introduce a network, X2Face, that can control a source face (specified by one or more frames) using another face in a driving frame to produce a generated frame with the identity of the source frame but the pose and expressi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
385
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 361 publications
(399 citation statements)
references
References 46 publications
3
385
0
Order By: Relevance
“…Methods. On the VoxCeleb1 dataset we compare our model against two other systems: X2Face [42] and Pix2pixHD [40]. For X2Face, we have used the model, as well as pretrained weights, provided by the authors (in the original paper it was also trained and evaluated on the Vox-Celeb1 dataset).…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Methods. On the VoxCeleb1 dataset we compare our model against two other systems: X2Face [42] and Pix2pixHD [40]. For X2Face, we have used the model, as well as pretrained weights, provided by the authors (in the original paper it was also trained and evaluated on the Vox-Celeb1 dataset).…”
Section: Methodsmentioning
confidence: 99%
“…To overcome the challenges, several works have proposed to synthesize articulated head sequences by warping a single or multiple static frames. Both classical warping algorithms [4,30] and warping fields synthesized using machine learning (including deep learning) [11,31,42] can be used for such purposes. While warping-based systems can create talking head sequences from as little as a single image, the amount of motion, head rotation, and disocclusion that they can handle without noticeable artifacts is limited.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…To generate out-of-domain warps, we randomly sampled the latent space of the optical flow generator in the X2face model [37] to generate warps. We note that although the X2face model is trained to generate face-specific warps, the warping field will not necessarily align with the portrait; moreover, since a VAE loss is not included during X2face training, sampling the bottleneck does not guarantee to have realistic warping fields.…”
Section: A4 Generalizationmentioning
confidence: 99%
“…More specifically, they are purely data-driven, leveraging a large collection of training data to learn a latent representation of the visual inputs for synthesis. Noting the significant progress of these techniques, recent research studies have started exploring the use of deep generative models for image animation and video retargeting [50,9,4,47,3]. These works demonstrate that deep models can effectively transfer motion patterns between human subjects in videos [4], or transfer a facial expression from one person to another [50].…”
Section: Introductionmentioning
confidence: 99%