Interspeech 2022 2022
DOI: 10.21437/interspeech.2022-367
|View full text |Cite
|
Sign up to set email alerts
|

CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 0 publications
0
1
0
Order By: Relevance
“…The objective of this paper is to understand how we can provide "narrow focus" word-level emphasis controllability for multiple voices and languages (1) without quality degradation, (2) without annotation, (3) without recordings and (4) if possible without model re-training. While context awareness of TTS system has vastly improved (see [3], [4] among others), automated output does not always assign the correct intonation to cases like (1e), given preceding context . Several commercial TTS system thus allow users to tweak the automated output by manually assigning emphasis (which we use as an umbrella term for narrow or contrastive focus) to a selected word.…”
Section: Introductionmentioning
confidence: 99%
“…The objective of this paper is to understand how we can provide "narrow focus" word-level emphasis controllability for multiple voices and languages (1) without quality degradation, (2) without annotation, (3) without recordings and (4) if possible without model re-training. While context awareness of TTS system has vastly improved (see [3], [4] among others), automated output does not always assign the correct intonation to cases like (1e), given preceding context . Several commercial TTS system thus allow users to tweak the automated output by manually assigning emphasis (which we use as an umbrella term for narrow or contrastive focus) to a selected word.…”
Section: Introductionmentioning
confidence: 99%