Yudong Guo scite author profile

With the powerfulness of convolution neural networks (CNN), CNN based face reconstruction has recently shown promising performance in reconstructing detailed face shape from 2D face images. The success of CNN-based methods relies on a large number of labeled data. The state-of-the-art synthesizes such data using a coarse morphable face model, which however has difficulty to generate detailed photo-realistic images of faces (with wrinkles). This paper presents a novel face data generation method. Specifically, we render a large number of photo-realistic face images with different attributes based on inverse rendering. Furthermore, we construct a fine-detailed face image dataset by transferring different scales of details from one image to another. We also construct a large number of video-type adjacent frame pairs by simulating the distribution of real video data. With these nicely constructed datasets, we propose a coarse-to-fine learning framework consisting of three convolutional networks. The networks are trained for real-time detailed 3D face reconstruction from monocular video as well as from a single image. Extensive experimental results demonstrate that our framework can produce high-quality reconstruction but with much less computation time compared to the state-of-the-art. Moreover, our method is robust to pose, expression and lighting due to the diversity of data.

show abstract

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis

Guo

Chen

Liang

et al. 2021

153

View full text Add to dashboard Cite

Semi-supervised 3D Face Representation Learning from Unconstrained Photo Collections

Gao

Zhang

Guo

et al. 2020

View full text Add to dashboard Cite

CNN-based Real-time Dense Face Reconstruction with Inverse-rendered Photo-realistic Face Images

Guo

Zhang

Cai

et al. 2017

Preprint

View full text Add to dashboard Cite

StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision

Hong

Zhang

Jiang

et al. 2021

View full text Add to dashboard Cite

Landmark Detection and 3D Face Reconstruction for Caricature using a Nonlinear Parametric Model

et al. 2021

View full text Add to dashboard Cite

Reconstructing Personalized Semantic Facial NeRF Models from Monocular Video

et al. 2022

View full text Add to dashboard Cite

We present a novel semantic model for human head defined with neural radiance field. The 3D-consistent head model consist of a set of disentangled and interpretable bases, and can be driven by low-dimensional expression coefficients. Thanks to the powerful representation ability of neural radiance field, the constructed model can represent complex facial attributes including hair, wearings, which can not be represented by traditional mesh blendshape. To construct the personalized semantic facial model, we propose to define the bases as several multi-level voxel fields. With a short monocular RGB video as input, our method can construct the subject's semantic facial NeRF model with only ten to twenty minutes, and can render a photorealistic human head image in tens of miliseconds with a given expression coefficient and view direction. With this novel representation, we apply it to many tasks like facial retargeting and expression editing. Experimental results demonstrate its strong representation ability and training/inference speed. Demo videos and released code are provided in our project page: https://ustc3dv.github.io/NeRFBlendShape/

show abstract

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis

Guo¹,

Chen²,

Liu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Generating high-fidelity talking head video by fitting with the input audio sequence is a challenging problem that receives considerable attentions recently. In this paper, we address this problem with the aid of neural scene representation networks. Our method is completely different from existing methods that rely on intermediate representations like 2D landmarks or 3D face models to bridge the gap between audio input and video output. Specifically, the feature of input audio signal is directly fed into a conditional implicit function to generate a dynamic neural radiance field, from which a high-fidelity talking-head video corresponding to the audio signal is synthesized using volume rendering. Another advantage of our framework is that not only the head (with hair) region is synthesized as previous methods did, but also the upper body is generated via two individual neural radiance fields. Experimental results demonstrate that our novel framework can (1) produce high-fidelity and natural results, and (2) support free adjustment of audio signals, viewing directions, and background images. 1 * This work was done when Yudong Guo and Keyu Chen were intern at Dilusense.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yudong Guo

CNN-Based Real-Time Dense Face Reconstruction with Inverse-Rendered Photo-Realistic Face Images

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis

Semi-supervised 3D Face Representation Learning from Unconstrained Photo Collections

CNN-based Real-time Dense Face Reconstruction with Inverse-rendered Photo-realistic Face Images

StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision

Landmark Detection and 3D Face Reconstruction for Caricature using a Nonlinear Parametric Model

Reconstructing Personalized Semantic Facial NeRF Models from Monocular Video

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis

Contact Info

Product

Resources

About