Capture, Learning, and Synthesis of 3D Speaking Styles

Cudeiro, Daniel; Bolkart, Timo; Laidlaw, Cassidy; Ranjan, Anurag; Black, Michael J.

doi:10.1109/cvpr.2019.01034

Cited by 231 publications

(250 citation statements)

References 63 publications

(78 reference statements)

Supporting

Mentioning

246

Contrasting

Unclassified

Order By: Relevance

“…New commercial HMDs, such as the Vive Pro Eye, can enable correct rendering of the avatar motions, which may again improve the reported results. Additionally, lip-sync systems for avatar animation keep evolving and they are currently reaching human perception levels [6]. Hence we hypothesise that lip-sync will become an even more common form of facial animation.…”

Section: Discussionmentioning

confidence: 98%

Using Facial Animation to Increase the Enfacement Illusion and Avatar Self-Identification

González-Franco

Steed

Hoogendyk

et al. 2020

IEEE Trans. Visual. Comput. Graphics

View full text Add to dashboard Cite

show abstract

Section: Discussionmentioning

confidence: 98%

Using Facial Animation to Increase the Enfacement Illusion and Avatar Self-Identification

González-Franco

Steed

Hoogendyk

et al. 2020

IEEE Trans. Visual. Comput. Graphics

View full text Add to dashboard Cite

show abstract

“…Some works aim to synthesize coherent dynamic 3D face videos of a fixed identity with the help of 3DMMs. These include works that synthesize 4D videos from a static 3D mesh paired with semantic label information [Bolkart and Wuhrer 2015a], and from a static 3D mesh and audio information [Cudeiro et al 2019].…”

Section: Correspondencementioning

confidence: 99%

3D Morphable Face Models—Past, Present, and Future

et al. 2020

Self Cite

View full text Add to dashboard Cite

In this article, we provide a detailed survey of 3D Morphable Face Models over the 20 years since they were first proposed. The challenges in building and applying these models, namely, capture, modeling, image formation, and image analysis, are still active research topics, and we review the state-of-the-art in each of these areas. We also look ahead, identifying unsolved challenges, proposing directions for future research, and highlighting the broad range of current and future applications.

show abstract

“…VisemeNet [50], a threestage LSTM network is proposed to achieve real time audio-lip synchronization and can be seamlessly integrated into existing animation workflows. VOCA [13], which is trained on a unique 4D face dataset, takes any speech signal as input and realistically animates a wide range of adult faces. Methods based on computer graphics require collection and manipulation on complex head models.…”

Section: Talking Face Generationmentioning

confidence: 99%

Talking Face Generation with Expression-Tailored Generative Adversarial Network

Zeng

Han

Lin

et al. 2020

Proceedings of the 28th ACM International Conference on Multimedia

View full text Add to dashboard Cite

A key of automatically generating vivid talking faces is to synthesize identity-preserving natural facial expressions beyond audio-lip synchronization, which usually need to disentangle the informative features from multiple modals and then fuse them together. In this paper, we propose an end-to-end Expression-Tailored Generative Adversarial Network (ET-GAN) to generate an expression enriched talking face video of arbitrary identity. Different from talking face generation based on identity image and audio, an expressional video of arbitrary identity serves as the expression source in our approach. Expression encoder is proposed to disentangle expression-tailored representation from the guiding expressional video, while audio encoder disentangles audio-lip representation. Instead of using single image as identity input, multi-image identity encoder is proposed by learning different views of faces and merging a unified representation. Multiple discriminators are exploited to keep both image-aware and the video-aware realistic details, including a spatial-temporal discriminator for visual continuity of expression synthesis and facial movements. We conduct extensive experimental evaluations on quantitative metrics, expression retention quality and audiovisual synchronization. The results show the effectiveness of our ET-GAN in generating high quality expressional talking face videos against existing state-of-the-arts.

show abstract

Capture, Learning, and Synthesis of 3D Speaking Styles

Cited by 231 publications

References 63 publications

Using Facial Animation to Increase the Enfacement Illusion and Avatar Self-Identification

Using Facial Animation to Increase the Enfacement Illusion and Avatar Self-Identification

3D Morphable Face Models—Past, Present, and Future

Talking Face Generation with Expression-Tailored Generative Adversarial Network

Contact Info

Product

Resources

About