Shallow Diffusion Motion Model for Talking Face Generation from Speech

Zhang, Xulong; Wang, Jianzong; Cheng, Ning; Xiao, Edward; Xiao, Jing

doi:10.1007/978-3-031-25198-6_11

Cited by 5 publications

(2 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While the above approaches are currently the only end-to-end diffusion based methods, a number of structural based approaches, that leverage diffusion models have also been proposed in recent months. Zhang et al (2022) proposed an approach that used audio to predict landmarks, before using a diffusion based renderer to output the final frame. Zhua et al (2023) also utilised a diffusion model similarly, using it to take the source image and the predicted motion features as input to generate the high-resolution frames.…”

Section: Diffusion-based Generationmentioning

confidence: 99%

Multilingual video dubbing—a technology review and current challenges

Bigioi,

Corcoran

2023

Front. Signal Process.

View full text Add to dashboard Cite

The proliferation of multi-lingual content on today’s streaming services has created a need for automated multi-lingual dubbing tools. In this article, current state-of-the-art approaches are discussed with reference to recent works in automatic dubbing and the closely related field of talking head generation. A taxonomy of papers within both fields is presented, and the main challenges of both speech-driven automatic dubbing, and talking head generation are discussed and outlined, together with proposals for future research to tackle these issues.

show abstract

Section: Diffusion-based Generationmentioning

confidence: 99%

Multilingual video dubbing—a technology review and current challenges

Bigioi,

Corcoran

2023

Front. Signal Process.

View full text Add to dashboard Cite

show abstract

“…Such talking face generation requires producing realistic facial movements and synchronized speech in response to audio input. With the rapid evolution of deep learning, it becomes easily to handle with a huge amount of audio and visual data and producing satisfying results with techniques like Generative Adversarial Network (GAN) [18], [19] and diffusion model [20], [21]. Recent methods focus on the optimization on the important parts, such as identity preservation [22], face animation [23], pose control [5] and audio-video synchronization [24].…”

Section: A Talking Face Generationmentioning

confidence: 99%

Shenzhen Fengshen Industrial Development Co., Ltd v France Eurasian International Technology Development Co., Ltd

Wang

Zhang

Guo

2022

Selected Chinese Cases on the UN Sales Convention (CISG) Vol. 1

View full text Add to dashboard Cite

Intent is defined for understanding spoken language in existing works. Both textual features and acoustic features involved in medical speech contain intent, which is important for symptomatic diagnosis. In this paper, we propose a medical speech classification model named DRSC that automatically learns to disentangle intent and content representations from textual-acoustic data for classification. The intent representations of the text domain and the Mel-spectrogram domain are extracted via intent encoders, and then the reconstructed text feature and the Mel-spectrogram feature are obtained through two exchanges. After combining the intent from two domains into a joint representation, the integrated intent representation is fed into a decision layer for classification. Experimental results show that our model obtains an average accuracy rate of 95% in detecting 25 different medical symptoms.

show abstract

EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model

Zhang,

Cheng

et al. 2024

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Shallow Diffusion Motion Model for Talking Face Generation from Speech

Cited by 5 publications

References 41 publications

Multilingual video dubbing—a technology review and current challenges

Multilingual video dubbing—a technology review and current challenges

Shenzhen Fengshen Industrial Development Co., Ltd v France Eurasian International Technology Development Co., Ltd

EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model

Contact Info

Product

Resources

About