2016
DOI: 10.1007/978-3-319-45925-7_9
|View full text |Cite
|
Sign up to set email alerts
|

Optimal Feature Set and Minimal Training Size for Pronunciation Adaptation in TTS

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
1
1

Relationship

2
0

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 14 publications
0
4
0
Order By: Relevance
“…In their previous work [4], [43], the authors have presented the training process of a voice-specific P2P model with the corpus TelecomVo training subcorpus. A first set of 15 features including linguistic, phonological and prosodic features with a W 2 window, was automatically selected.…”
Section: Voice-specific P2p Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…In their previous work [4], [43], the authors have presented the training process of a voice-specific P2P model with the corpus TelecomVo training subcorpus. A first set of 15 features including linguistic, phonological and prosodic features with a W 2 window, was automatically selected.…”
Section: Voice-specific P2p Modelmentioning
confidence: 99%
“…Except alphabet mapping, four types of phoneme confusions have been reported. A lot of pronunciation variants, related to the pronunciation of the speaker itself, are observed for midvowels /ø/, /@/, /e/, /E/, /O/, /o/ (for example, /e/ ↔ /E/ and /o/ ↔ /O/) [43], [38]. The elision of final liquids /K/ and /l/ is also observed in the target pronunciation.…”
Section: Phoneme Confusions Between Stylesmentioning
confidence: 99%
“…The voice pronunciation model adapts canonical phonemes to phonemes as realized in the speech corpus. In previous work [19,20], we have presented the training process of a P2P voice-specific model with the corpus Telecom. Table 2 shows the distribution of selected features within groups.…”
Section: P2p Voice-specific Pronunciation Modelmentioning
confidence: 99%
“…It was also used to predict a corpus-specific pronunciation, i.e. a pronunciation adapted to the TTS voice corpus, thus conducting to a significant improvement of the overall quality of synthesized speech [19,20]. In the work realized in [19], we manage to synthesize good quality speech samples on a neutral voice.…”
Section: Introductionmentioning
confidence: 99%