2018
DOI: 10.1145/3197517.3201283
|View full text |Cite
|
Sign up to set email alerts
|

Deep video portraits

Abstract: We present a novel approach that enables photo-realistic re-animation of portrait videos using only an input video. In contrast to existing approaches that are restricted to manipulations of facial expressions only, we are the first to transfer the full 3D head position, head rotation, face expression, eye gaze, and eye blinking from a source actor to a portrait video of a target actor. The core of our approach is a generative neural network with a novel space-time architecture. The network takes as input synt… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
451
0
2

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 563 publications
(482 citation statements)
references
References 59 publications
1
451
0
2
Order By: Relevance
“…Image translation techniques can be used to rerender scenes in a more realistic domain, to enable facial expression synthesis [20], to fix artifacts in captured 3D performances [28], or to add viewpoint-dependent effects [44]. In our paper, we demonstrate an approach for training a neural rerendering framework in the wild, i.e., with uncontrolled data instead of captures under constant lighting conditions.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Image translation techniques can be used to rerender scenes in a more realistic domain, to enable facial expression synthesis [20], to fix artifacts in captured 3D performances [28], or to add viewpoint-dependent effects [44]. In our paper, we demonstrate an approach for training a neural rerendering framework in the wild, i.e., with uncontrolled data instead of captures under constant lighting conditions.…”
Section: Related Workmentioning
confidence: 99%
“…We adapt recent neural rerendering frameworks [20,28] to work with unstructured photo collections. Given a large internet photo collection {I i } of a scene, we first generate a proxy 3D reconstruction using COLMAP [36,37,38], which applies Structure-from-Motion (SfM) and Multi-View Stereo (MVS) to create a dense colored point cloud.…”
Section: Neural Rerendering Frameworkmentioning
confidence: 99%
See 1 more Smart Citation
“…The requirements for manipulating or synthesizing videos were dramatically simplified when it became possible to create forged videos from only a short video of the target person [5,7] and then from a single ID photo [8] following the acting of an actor. Suwajanakorn et al's mapping method [9] enhanced the ability of manipulators to learn the mapping between speech and lip motion.…”
Section: Introductionmentioning
confidence: 99%
“…In particular, here we apply neural re-simulation from trajectories with noisy and insufficient data to plausible output. Our solution is thus also related to recent approaches for re-rendering scenes with a neural network [21,27,29,44]; here we seek to re-simulate dynamic trajectory outputs.…”
Section: Introductionmentioning
confidence: 99%