2014
DOI: 10.1016/j.specom.2013.09.013
|View full text |Cite
|
Sign up to set email alerts
|

Toward invariant functional representations of variable surface fundamental frequency contours: Synthesizing speech melody via model-based stochastic learning

Abstract: Variability has been one of the major challenges for both theoretical understanding and computer synthesis of speech prosody. In this paper we show that economical representation of variability is the key to effective modeling of prosody. Specifically, we report the development of PENTAtrainer -A trainable yet deterministic prosody synthesizer based on an articulatory-functional view of speech. We show with testing results on Thai, Mandarin and English that it is possible to achieve high-accuracy predictive sy… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
59
0
1

Year Published

2015
2015
2021
2021

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 46 publications
(61 citation statements)
references
References 85 publications
(135 reference statements)
1
59
0
1
Order By: Relevance
“…For the purpose of this analysis, only tokens evaluated on a perceptual basis as being uttered with a "normal" rate were selected, and no normalization was applied. The PRAAT scripts ProsodyPro (Xu & Prom-on 2014), and PENTAtrainer 1 (Prom-on, Xu & Thipakorn 2009) were then used to obtain the measurements of the prosodic correlates that include duration (in milliseconds), intensity (in decibels), mean F0 (average of 10 measurements over the syllable, in Hz), and excursion size (F0maxima -minima, in semitones).…”
Section: Methodsmentioning
confidence: 99%
“…For the purpose of this analysis, only tokens evaluated on a perceptual basis as being uttered with a "normal" rate were selected, and no normalization was applied. The PRAAT scripts ProsodyPro (Xu & Prom-on 2014), and PENTAtrainer 1 (Prom-on, Xu & Thipakorn 2009) were then used to obtain the measurements of the prosodic correlates that include duration (in milliseconds), intensity (in decibels), mean F0 (average of 10 measurements over the syllable, in Hz), and excursion size (F0maxima -minima, in semitones).…”
Section: Methodsmentioning
confidence: 99%
“…Although the majority of modern Chinese words are disyllabic and the uncertainty of disyllabic tonal realizations has partly been explained by individual backgrounds, how disyllabic JM words are realized in connected speech needs further investigation. Researchers have done fruitful studies on contextual tonal realizations and their interactions with sentential prosodies (Chen, 2010;Chen and Gussenhoven, 2008;Xu, 1997Xu, , 1999Xu and Prom-on, 2014;Xu and Wang, 2001). However, how to transfer this knowledge from SC to the other Chinese dialects and how the predictors we investigated work in context still open questions.…”
Section: Limitationsmentioning
confidence: 99%
“…PENTA has been implemented to perform both local and global optimization methods [2,9]. The detailed implementation of PENTA with global optimization is given in [9]. Target approximation (TA) in PENTA is mathematically realized as a third-order critically damped linear system driven by pitch targets, as shown in:…”
Section: Target Approximation (Ta) In Penta Modelmentioning
confidence: 99%
“…To test the CPP program, we took subsets of two corpora used previously in the development of PENTAtrainer2 [9]. The first corpus was collected for a study of tone, focus and sentence modality in Mandarin Chinese and the second one was collected for a study of stress, focus and sentence modality in American English [12].…”
Section: Test Datasetmentioning
confidence: 99%
See 1 more Smart Citation