12th ISCA Speech Synthesis Workshop (SSW2023) 2023
DOI: 10.21437/ssw.2023-18
|View full text |Cite
|
Sign up to set email alerts
|

Controllable Emphasis with zero data for text-to-speech

Arnaud Joly,
Marco Nicolis,
Ekaterina Peterova
et al.

Abstract: We present a scalable method to produce high quality emphasis for text-to-speech (TTS) that does not require recordings or annotations. Many TTS models include a phoneme duration model. A simple but effective method to achieve emphasized speech consists in increasing the predicted duration of the emphasised word. We show that this is significantly better than spectrogram modification techniques improving naturalness by 7.3% and correct testers' identification of the emphasized word in a sentence by 40% on a re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 31 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?