X2Face: A Network for Controlling Face Generation Using Images, Audio, and Pose Codes

Wiles, Olivia; Koepke, A. Sophia; Zisserman, Andrew

doi:10.1007/978-3-030-01261-8_41

Cited by 361 publications

(399 citation statements)

References 46 publications

Supporting

Mentioning

385

Contrasting

Order By: Relevance

“…Methods. On the VoxCeleb1 dataset we compare our model against two other systems: X2Face [42] and Pix2pixHD [40]. For X2Face, we have used the model, as well as pretrained weights, provided by the authors (in the original paper it was also trained and evaluated on the Vox-Celeb1 dataset).…”

Section: Methodsmentioning

confidence: 99%

“…To overcome the challenges, several works have proposed to synthesize articulated head sequences by warping a single or multiple static frames. Both classical warping algorithms [4,30] and warping fields synthesized using machine learning (including deep learning) [11,31,42] can be used for such purposes. While warping-based systems can create talking head sequences from as little as a single image, the amount of motion, head rotation, and disocclusion that they can handle without noticeable artifacts is limited.…”

Section: Introductionmentioning

confidence: 99%

“…In the experiments, we provide comparisons of talking heads created by our system with alternative neural talking head models [17,42] via quantitative measurements and a user study, where our approach generates images of sufficient realism and personalization fidelity to deceive the study participants. We demonstrate several uses of our talking head models, including video synthesis using landmark tracks extracted from video sequences of the same person, as well as puppeteering (video synthesis of a certain person based on the face landmark tracks of a different person).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Few-Shot Adversarial Learning of Realistic Neural Talking Head Models

Zakharov

Shysheya

Burkov

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

572

477

View full text Add to dashboard Cite

SourceTarget → Landmarks → Result Source Target → Landmarks → Result Figure 1: The results of talking head image synthesis using face landmark tracks extracted from a different video sequence of the same person (on the left), and using face landmarks of a different person (on the right). The results are conditioned on the landmarks taken from the target frame, while the source frame is an example from the training set. The talking head models on the left were trained using eight frames, while the models on the right were trained in a one-shot manner. AbstractSeveral recent works have shown how highly realistic human head images can be obtained by training convolutional neural networks to generate them. In order to create a personalized talking head model, these works require training on a large dataset of images of a single person. However, in many practical scenarios, such personalized talking head models need to be learned from a few image views of a person, potentially even a single image. Here, we present a system with such few-shot capability. It performs lengthy meta-learning on a large dataset of videos, and after that is able to frame few-and one-shot learning of neural talking head models of previously unseen people as adversarial training problems with high capacity generators and discriminators. Crucially, the system is able to initialize the parameters of both the generator and the discriminator in a person-specific way, so that training can be based on just a few images and done quickly, despite the need to tune tens of millions of parameters. We show that such an approach is able to learn highly realistic and personalized talking head models of new people and even portrait paintings.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Few-Shot Adversarial Learning of Realistic Neural Talking Head Models

Zakharov

Shysheya

Burkov

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

572

477

View full text Add to dashboard Cite

show abstract

“…To generate out-of-domain warps, we randomly sampled the latent space of the optical flow generator in the X2face model [37] to generate warps. We note that although the X2face model is trained to generate face-specific warps, the warping field will not necessarily align with the portrait; moreover, since a VAE loss is not included during X2face training, sampling the bottleneck does not guarantee to have realistic warping fields.…”

Section: A4 Generalizationmentioning

confidence: 99%

Detecting Photoshopped Faces by Scripting Photoshop

Wang

Zhang

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

100

View full text Add to dashboard Cite

a) Manipulated photo (b) Detected manipulations (c) Suggested "undo" (d) Original photo Figure 1: Given an input face (a), our tool can detect that the face has been warped with the Face-Aware Liquify tool from Photoshop, predict where the face has been warped (b), and attempt to "undo" the warp (c) and recover the original image (d). AbstractMost malicious photo manipulations are created using standard image editing tools, such as Adobe R Photoshop R . We present a method for detecting one very popular Photoshop manipulation -image warping applied to human faces -using a model trained entirely using fake images that were automatically generated by scripting Photoshop itself. We show that our model outperforms humans at the task of recognizing manipulated images, can predict the specific location of edits, and in some cases can be used to "undo" a manipulation to reconstruct the original, unedited image. We demonstrate that the system can be successfully applied to real, artist-created image manipulations.

show abstract

“…More specifically, they are purely data-driven, leveraging a large collection of training data to learn a latent representation of the visual inputs for synthesis. Noting the significant progress of these techniques, recent research studies have started exploring the use of deep generative models for image animation and video retargeting [50,9,4,47,3]. These works demonstrate that deep models can effectively transfer motion patterns between human subjects in videos [4], or transfer a facial expression from one person to another [50].…”

Section: Introductionmentioning

confidence: 99%

Animating Arbitrary Objects via Deep Motion Transfer

Siarohin

Lathuilière

Tulyakov

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

339

379

View full text Add to dashboard Cite

This paper introduces a novel deep learning framework for image animation. Given an input image with a target object and a driving video sequence depicting a moving object, our framework generates a video in which the target object is animated according to the driving sequence. This is achieved through a deep architecture that decouples appearance and motion information. Our framework consists of three main modules: (i) a Keypoint Detector unsupervisely trained to extract object keypoints, (ii) a Dense Motion prediction network for generating dense heatmaps from sparse keypoints, in order to better encode motion information and (iii) a Motion Transfer Network, which uses the motion heatmaps and appearance information extracted from the input image to synthesize the output frames. We demonstrate the effectiveness of our method on several benchmark datasets, spanning a wide variety of object appearances, and show that our approach outperforms stateof-the-art image animation and video generation methods. Our source code is publicly available 1 .

show abstract

X2Face: A Network for Controlling Face Generation Using Images, Audio, and Pose Codes

Cited by 361 publications

References 46 publications

Few-Shot Adversarial Learning of Realistic Neural Talking Head Models

Few-Shot Adversarial Learning of Realistic Neural Talking Head Models

Detecting Photoshopped Faces by Scripting Photoshop

Animating Arbitrary Objects via Deep Motion Transfer

Contact Info

Product

Resources

About