2022
DOI: 10.48550/arxiv.2207.03800
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis

Yongqi Wang,
Zhou Zhao

Abstract: Unconstrained lip-to-speech synthesis aims to generate corresponding speeches from silent videos of talking faces with no restriction on head poses or vocabulary. Current works mainly use sequenceto-sequence models to solve this problem, either in an autoregressive architecture or a flow-based non-autoregressive architecture. However, these models suffer from several drawbacks: 1) Instead of directly generating audios, they use a two-stage pipeline that first generates mel-spectrograms and then reconstructs au… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 28 publications
(65 reference statements)
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?