ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8682275
|View full text |Cite
|
Sign up to set email alerts
|

Cross-language Speech Dependent Lip-synchronization

Abstract: Understanding videos of people speaking across international borders is hard as audiences from different demographies do not understand the language. Such speech videos are often supplemented with language subtitles. However, these hamper the viewing experience as the attention is shared. Simple audio dubbing in a different language makes the video appear unnatural due to unsynchronized lip motion. In this paper, we propose a system for automated cross-language lip synchronization for re-dubbed videos. Our mod… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 19 publications
(20 reference statements)
0
4
0
Order By: Relevance
“…As reported before in [12], a large amount of online educational content is present in English in the form of video lectures. They are often aided with subtitles of foreign languages.…”
Section: Educational Videosmentioning
confidence: 69%
“…As reported before in [12], a large amount of online educational content is present in English in the form of video lectures. They are often aided with subtitles of foreign languages.…”
Section: Educational Videosmentioning
confidence: 69%
“…Automated dubbing A common approach to automated dubbing is to generate or modify the video frames to match a given clip of audio speech [2,34,35,36,37,38,39,40,41]. This wide and active area of research uses approaches that vary from conditional video generation, to retrieval, to 3D models.…”
Section: Datasetsmentioning
confidence: 99%
“…Lip synchronization Generating talking mouth videos by conditioning on audio [14,33,34,44] is more applicable to tackling audiovisual dubbing. Further literature uses similar models conditioned on text [45], videos of other speakers [46], and facial landmarks [47]. Recent approaches improve the quality and sharpness of the generated clips using GANs [45,34,44].…”
Section: Related Workmentioning
confidence: 99%
“…The vast majority of the literature focuses on face generation for English content only. Only recent efforts [47,34], which came out contemporaneously with our work, have attempted to tackle the problem of audiovisual dubbing from English to Hindi. In our work, we aim for more systematic study of multilingual scenario.…”
Section: Multi-lingual Av Translationmentioning
confidence: 99%