2023
DOI: 10.48550/arxiv.2301.03396
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation

Abstract: Figure 1. Overview of the proposed approach. Given a single identity frame and an audio clip containing speech, the model uses a diffusion model to sample consecutive frames in an autoregressive manner, preserving the identity, and modeling lip and head movement to match the audio input. Contrary to other methods, no additional guidance is required.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 33 publications
0
1
0
Order By: Relevance
“…Related work on data-driven facial animation can be divided into two main categories-vision-based and speech-based where our focus is on the latter. There has been extensive work and research done in neural-rendering of talking head animations in 2D pixel space [Guo et al 2021;Lu et al 2021;Stypulkowski et al 2023;Wu et al 2021]. However, due to the limitations of rendered videos, which are not useful in 3D interactive applications, this research will address speech-driven facial animation in 3D space.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Related work on data-driven facial animation can be divided into two main categories-vision-based and speech-based where our focus is on the latter. There has been extensive work and research done in neural-rendering of talking head animations in 2D pixel space [Guo et al 2021;Lu et al 2021;Stypulkowski et al 2023;Wu et al 2021]. However, due to the limitations of rendered videos, which are not useful in 3D interactive applications, this research will address speech-driven facial animation in 3D space.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Additionally, several methods [33], [34] utilize the diffusion model [35] to produce synthetic image. Diffused Head [36] proposes an autoregressive diffusion model for talking face generation, which takes an image and an audio sequence as input and produces realistic head movements, facial expressions, and background preservation. DiffTalk [34] employs reference facial images and landmarks to facilitate personality-conscious general synthesis, adeptly producing high-resolution, audio-driven talking head videos for previously unseen identities, eliminating the need for fine-tuning.…”
Section: Talking Head Synthesismentioning
confidence: 99%
“…Additionally, DDPM adopts a progressive generation approach, allowing it to better maintain the global consistency and structure of images. Existing studies have confirmed that DDPM exhibits stronger capabilities compared to GANs [26], [27]. Therefore, applying DDPM to face image restoration tasks holds immense potential and promising prospects [28]- [31].…”
Section: Introductionmentioning
confidence: 99%