Speech Prosody 2018 2018
DOI: 10.21437/speechprosody.2018-121
|View full text |Cite
|
Sign up to set email alerts
|

Duration modeling using DNN for Arabic speech synthesis

Abstract: Duration modeling is a key task for every parametric speech synthesis system. Though such parametric systems have been adapted to many languages, no special attention was paid to explicitly handling Arabic speech characteristics. Actually, in Arabic phoneme duration has a distinctive role, because of consonant gemination and vowel quantity. Therefore, a precise modeling of sound durations is critical. In this paper we compare several modeling of phoneme durations (including duration modeling by HTS and MERLIN … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
2
2

Relationship

2
5

Authors

Journals

citations
Cited by 12 publications
(7 citation statements)
references
References 14 publications
0
7
0
Order By: Relevance
“…So far, this work has been confined to American English. We might speculate that duration information will be particularly useful for ASR in languages such as Japanese, Finnish, Estonian and Arabic [21] that have phonemic length.…”
Section: Discussionmentioning
confidence: 99%
“…So far, this work has been confined to American English. We might speculate that duration information will be particularly useful for ASR in languages such as Japanese, Finnish, Estonian and Arabic [21] that have phonemic length.…”
Section: Discussionmentioning
confidence: 99%
“…The prosodic parameters determine speech rhythm and accentuation. For some languages, duration also plays a role in distinguishing the meaning of speech sounds [4]. Therefore, the accurate modeling and prediction of speech-sound duration is important for ensuring that synthetic speech is well perceived.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, the deep neural network technique has grown so fast that it has become the core in most data-driven systems, including TTS systems [3]. Neural network approaches have also been widely adopted to model duration [6] [7] [4]. However, most approaches [8][3] [6] [7] [4] predict phoneme duration using the full context labels that represent phonemes in context, including linguistic features, such as stress, and positional features, such as the relative positions of different segment levels (phoneme, syllable, and word) inside higher-level segments.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…To cope with these issues, previous works suggested replacing decision trees by DNN [22] or using external models for duration [21]. Results showed that DNN outperformed HMM in terms of speech quality and naturalness of produced speech for English language [23,19].…”
Section: Introductionmentioning
confidence: 99%