2024
DOI: 10.1609/aaai.v38i3.28055
|View full text |Cite
|
Sign up to set email alerts
|

Let There Be Sound: Reconstructing High Quality Speech from Silent Videos

Ji-Hoon Kim,
Jaehun Kim,
Joon Son Chung

Abstract: The goal of this work is to reconstruct high quality speech from lip motions alone, a task also known as lip-to-speech. A key challenge of lip-to-speech systems is the one-to-many mapping caused by (1) the existence of homophenes and (2) multiple speech variations, resulting in a mispronounced and over-smoothed speech. In this paper, we propose a novel lip-to-speech system that significantly improves the generation quality by alleviating the one-to-many mapping problem from multiple perspectives. Specifically,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 34 publications
(60 reference statements)
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?