8th European Conference on Speech Communication and Technology (Eurospeech 2003) 2003
DOI: 10.21437/eurospeech.2003-91
|View full text |Cite
|
Sign up to set email alerts
|

Segmental durations predicted with a neural network

Abstract: This paper presents a segmental durations' model applied to the European Portuguese language for TTS purposes. The model is based on a feed-forward neural network, trained with a back-propagation algorithm, and has as input a set of phonological and contextual features, automatically extracted from the text. The relative importance of each feature, concerning the correlation with segmental durations and improvements in the performance of the model, is presented. Finally the model is evaluated objectively and s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
2
0

Year Published

2004
2004
2021
2021

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 23 publications
(5 citation statements)
references
References 13 publications
(12 reference statements)
0
2
0
Order By: Relevance
“…Preliminary work with 15 subjects gave an average score of 3.2 and 3.1 for the ACs predicted with labeled and predicted FCs, respectively, against 4.6 for the original stimulus. The ensemble usage of the whole prosody system, for durations [8] and F0, achieves a score of 3.0. The general scores achieved of 3 are at the "fair" level in a MOS scale.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Preliminary work with 15 subjects gave an average score of 3.2 and 3.1 for the ACs predicted with labeled and predicted FCs, respectively, against 4.6 for the original stimulus. The ensemble usage of the whole prosody system, for durations [8] and F0, achieves a score of 3.0. The general scores achieved of 3 are at the "fair" level in a MOS scale.…”
Section: Discussionmentioning
confidence: 99%
“…It is the last part of a prosody system that has been developed for text-to-speech synthesis of EP. This system consists of a specific model for prediction of the segmental durations [8] and two other models for prediction of F0 contours based on Fujisaki's FC and AC. The complete prosody system produces contours that modulate the speech that is to be produced from the given text.…”
Section: Discussionmentioning
confidence: 99%
“…This training, supervised or unsupervised, is based on the presentation of examples, and simulates a systematic learning process by determining the difference between the response given by the network and the expected behavior. The experience of the network is stored by the synaptic weights between neurons and its performance is evaluated, for example, by the ability to generalize behaviors, recognize patterns, fix errors or execute predictions [13][14][15].…”
Section: Multilayer Perceptron -Artificial Neural Networkmentioning
confidence: 99%
“…The units do the operations using only the input data received from the connections. The intelligent behavior of the network comes from the iterations between these units [3][4]. Fig.…”
Section: Introductionmentioning
confidence: 99%