Interspeech 2023 2023
DOI: 10.21437/interspeech.2023-1978
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating and reducing the distance between synthetic and real speech distributions

Abstract: While modern Text-to-Speech (TTS) systems can produce natural-sounding speech, they remain unable to reproduce the full diversity found in natural speech data. We consider the distribution of all possible real speech samples that could be generated by these speakers alongside the distribution of all synthetic samples that could be generated for the same set of speakers, using a particular TTS system. We set out to quantify the distance between real and synthetic speech via a range of utterance-level statistics… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 29 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?