In this paper, we propose a novel machine learning architecture for facial reenactment. In particular, contrary to the model-based approaches or recent frame-based methods that use Deep Convolutional Neural Networks (DCNNs) to generate individual frames, we propose a novel method that (a) exploits the special structure of facial motion (paying particular attention to mouth motion) and (b) enforces temporal consistency. We demonstrate that the proposed method can transfer facial expressions, pose and gaze of a source actor to a target video in a photo-realistic fashion more accurately than state-of-the-art methods.
This document is made available in accordance with publisher policies and may differ from the published version or from the version of record. If you wish to cite this item you are advised to consult the publisher's version. Please see the URL above for details on accessing the published version.
Monocular 4D face reconstruction is a challenging problem, especially in the case that the input video is captured under unconstrained conditions, i.e. "in the wild". The majority of the state-of-the-art approaches build upon 3D Morphable Modelling (3DMM), which has been proven to be more robust than model-free approaches such as Shape from Shading (SfS) or Structure from Motion (SfM). While offering visually plausible shape reconstruction results that resemble real faces, 3DMMs adhere to the model space learned from exemplar faces during the training phase, often yielding facial reconstructions that are excessively smooth and look too similar even across captured faces with completely different facial characteristics. This is due to the fact that 3DMMs are typically used as hard constraints on the reconstructed 3D shape. To overcome these limitations, in this paper we propose to combine 3DMMs with Dense Nonrigid Structure from Motion (DNSM), which is much less robust but has the potential of reconstructing fine details and capturing the subject-specific facial characteristics of every input. We effectively combine the best of both worlds by introducing a novel dense variational framework, which we solve efficiently by designing a convex optimisation strategy. In contrast to previous methods, we incorporate 3DMM as a soft constraint, penalizing both departure of reconstructed faces from the 3DMM subspace and variation of the identity component of the 3DMM over different frames of the input video. As demonstrated in qualitative and quantitative experiments, our method is robust, accurately estimates the 3D facial shape over time and outperforms other state-of-the-art methods of 4D face reconstruction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.