Proceedings of the 28th ACM International Conference on Multimedia 2020
DOI: 10.1145/3394171.3413532
|View full text |Cite
|
Sign up to set email alerts
|

A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild

Abstract: Figure 1: Our novel Wav2Lip model produces significantly more accurate lip-synchronization in dynamic, unconstrained talking face videos. Quantitative metrics indicate that the lip-sync in our generated videos are almost as good as real-synced videos.Thus, we believe that our model can enable a wide range of real-world applications where previous speaker-independent lipsyncing approaches [17,18] struggle to produce satisfactory results.

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
341
0
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 440 publications
(391 citation statements)
references
References 22 publications
(37 reference statements)
1
341
0
1
Order By: Relevance
“…It can be seen that our method reaches the best under most of the metrics on both datasets. On LRW, though Wav2Lip [44] outperforms our method given two metrics, the reason is In the top row are the audio-synced videos. ATVG [10] are accurate on the left.…”
Section: Quantitative Evaluationmentioning
confidence: 87%
See 1 more Smart Citation
“…It can be seen that our method reaches the best under most of the metrics on both datasets. On LRW, though Wav2Lip [44] outperforms our method given two metrics, the reason is In the top row are the audio-synced videos. ATVG [10] are accurate on the left.…”
Section: Quantitative Evaluationmentioning
confidence: 87%
“…Learning Speech Content Space. It has been verified that learning the natural synchronization between visual mouth movements and auditory utterances is valuable for driving images to speak [72,44]. Thus embedding space that contains synchronized audio-visual features as the speech content space.…”
Section: Modularization Of Representationsmentioning
confidence: 99%
“…Another category of Lip Synchronisation is non-constraint methods. We have studied here two non-constraint methods which are LipGAN [24] and wav2lip [5]. Wav2lip is speaker independent.…”
Section: Discussionmentioning
confidence: 99%
“…The model works quite well for not that dynamic videos. Upon investigating [5] the reason was inadequate discriminator loss function.…”
Section: Unconstrained Talking Face Generation From Speechmentioning
confidence: 99%
See 1 more Smart Citation