Deep video portraits

Kim, Hyeongwoo; Garrido, Pablo; Tewari, Ayush; Xu, Weipeng; Thies, Justus; Nießner, Matthias; Pérez, Patrick; Richardt, Christian; Zollhöfer, Michael; Theobalt, Christian

doi:10.1145/3197517.3201283

Cited by 563 publications

(482 citation statements)

References 59 publications

Supporting

Mentioning

451

Contrasting

Unclassified

Order By: Relevance

“…Image translation techniques can be used to rerender scenes in a more realistic domain, to enable facial expression synthesis [20], to fix artifacts in captured 3D performances [28], or to add viewpoint-dependent effects [44]. In our paper, we demonstrate an approach for training a neural rerendering framework in the wild, i.e., with uncontrolled data instead of captures under constant lighting conditions.…”

Section: Related Workmentioning

confidence: 99%

“…We adapt recent neural rerendering frameworks [20,28] to work with unstructured photo collections. Given a large internet photo collection {I i } of a scene, we first generate a proxy 3D reconstruction using COLMAP [36,37,38], which applies Structure-from-Motion (SfM) and Multi-View Stereo (MVS) to create a dense colored point cloud.…”

Section: Neural Rerendering Frameworkmentioning

confidence: 99%

“…However, the image-to-image translation paradigm used in [20,28] is not appropriate for our use case, as it assumes a one-to-one mapping between inputs and outputs. A scene observed from a particular viewpoint can look very different depending on weather, lighting conditions, color balance, post processing filters, etc.…”

Section: Neural Rerendering Frameworkmentioning

confidence: 99%

See 2 more Smart Citations

Neural Rerendering in the Wild

Meshry

Goldman

Khamis

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

210

141

View full text Add to dashboard Cite

We explore total scene capture -recording, modeling, and rerendering a scene under varying appearance such as season and time of day. Starting from internet photos of a tourist landmark, we apply traditional 3D reconstruction to register the photos and approximate the scene as a point cloud. For each photo, we render the scene points into a deep framebuffer, and train a neural network to learn the mapping of these initial renderings to the actual photos. This rerendering network also takes as input a latent appearance vector and a semantic mask indicating the location of transient objects like pedestrians. The model is evaluated on several datasets of publicly available images spanning a broad range of illumination conditions. We create short videos demonstrating realistic manipulation of the image viewpoint, appearance, and semantic labeling. We also compare results with prior work on scene reconstruction from internet photos.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Neural Rerendering Frameworkmentioning

confidence: 99%

See 1 more Smart Citation

Neural Rerendering in the Wild

Meshry

Goldman

Khamis

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

210

141

View full text Add to dashboard Cite

show abstract

“…The requirements for manipulating or synthesizing videos were dramatically simplified when it became possible to create forged videos from only a short video of the target person [5,7] and then from a single ID photo [8] following the acting of an actor. Suwajanakorn et al's mapping method [9] enhanced the ability of manipulators to learn the mapping between speech and lip motion.…”

Section: Introductionmentioning

confidence: 99%

Capsule-forensics: Using Capsule Networks to Detect Forged Images and Videos

Nguyen

Yamagishi

Echizen

2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

510

224

View full text Add to dashboard Cite

The revolution in computer hardware, especially in graphics processing units and tensor processing units, has enabled significant advances in computer graphics and artificial intelligence algorithms. In addition to their many beneficial applications in daily life and business, computergenerated/manipulated images and videos can be used for malicious purposes that violate security systems, privacy, and social trust. The deepfake phenomenon and its variations enable a normal user to use his or her personal computer to easily create fake videos of anybody from a short real online video. Several countermeasures have been introduced to deal with attacks using such videos. However, most of them are targeted at certain domains and are ineffective when applied to other domains or new attacks. In this paper, we introduce a capsule network that can detect various kinds of attacks, from presentation attacks using printed images and replayed videos to attacks using fake videos created using deep learning. It uses many fewer parameters than traditional convolutional neural networks with similar performance. Moreover, we explain, for the first time ever in the literature, the theory behind the application of capsule networks to the forensics problem through detailed analysis and visualization.

show abstract

“…In particular, here we apply neural re-simulation from trajectories with noisy and insufficient data to plausible output. Our solution is thus also related to recent approaches for re-rendering scenes with a neural network [21,27,29,44]; here we seek to re-simulate dynamic trajectory outputs.…”

Section: Introductionmentioning

confidence: 99%

Neural Re-Simulation for Generating Bounces in Single Images

Innamorati

Russell

Kaufman

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

We introduce a method to generate videos of dynamic virtual objects plausibly interacting via collisions with a still image's environment. Given a starting trajectory, physically simulated with the estimated geometry of a single, static input image, we learn to 'correct' this trajectory to a visually plausible one via a neural network. The neural network can then be seen as learning to 'correct' traditional simulation output, generated with incomplete and imprecise world information, to obtain context-specific, visually plausible re-simulated output -a process we call neural resimulation. We train our system on a set of 50k synthetic scenes where a virtual moving object (ball) has been physically simulated. We demonstrate our approach on both our synthetic dataset and a collection of real-life images depicting everyday scenes, obtaining consistent improvement over baseline alternatives throughout.

show abstract

Deep video portraits

Cited by 563 publications

References 59 publications

Neural Rerendering in the Wild

Neural Rerendering in the Wild

Capsule-forensics: Using Capsule Networks to Detect Forged Images and Videos

Neural Re-Simulation for Generating Bounces in Single Images

Contact Info

Product

Resources

About