Rules for the generation of ToBI-based American English intonation

Jilka, Matthias; Möhler, Gregor; Dogil, Grzegorz

doi:10.1016/s0167-6393(99)00008-4

Cited by 28 publications

(10 citation statements)

References 20 publications

(13 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Previously observed interaction between focus and sentence modality in terms of surface F 0 contours (Cooper et al, 1986;Pell, 2001;Xu and Xu, 2005) is successfully simulated using only 26 sets of categorical parameters representing four functional layers: stress, focus, syllable position and sentence modality. Compared to previous attempts to model English intonation (Jilka et al, 1999;Grabe et al, 2007;Taylor, 2000), the present results show both accurate F 0 contours and high generalizability, as the learned parameters are directly related to communicative functions. Bold-face indicates a focus placement and underline indicates a stress syllable of that word.…”

Section: Englishmentioning

confidence: 44%

“…RMSE indicates the average mismatch of the contours while correlation indicates the mismatch between the shape and the alignment of the contours. These two measurements have been shown to be effective (Hermes, 1998), and have been widely used as computational metrics in previous prosody modeling works (Black and Hunt, 1996;Jilka et al, 1999;Prom-on et al, 2009Prom-on et al, , 2011Prom-on et al, , 2012Ross and Ostendorf, 1999;Taylor, 2000).…”

Section: Testing Methodsmentioning

confidence: 99%

“…Among the various aspects of prosody, fundamental frequency (F 0 ) is by far the most challenging, and has attracted most of the research effort. Many theories and computational models of F 0 patterns have been proposed over the years (Anderson et al, 1984;Bailly and Holm, 2005;Black and Hunt, 1996;Fujisaki et al, 2005;Grabe et al, 2007;Hirst, 2005Hirst, , 2011Jilka et al, 1999;Kochanski and Shih, 2003;Mixdorff et al, 2003;Pierrehumbert, 1980Pierrehumbert, , 1981Prom-on et al, 2009;Taylor, 2000;van Santen and Möbius, 2000;Xu and Wang, 2001;Xu, 2005), and a large number of empirical studies have been conducted (as reviewed by Wagner and Watson, 2010;ShattuckHufnagel and Turk, 1996;Xu, 2011). Despite the extensive effort, however, most of the critical issues still remain unresolved and some are still under heated debate (Arvaniti and Ladd, 2009;Ladd, 2008;Wagner and Watson, 2010;Wightman, 2002;Xu, 2011).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Toward invariant functional representations of variable surface fundamental frequency contours: Synthesizing speech melody via model-based stochastic learning

Prom-on

2014

Speech Communication

View full text Add to dashboard Cite

Variability has been one of the major challenges for both theoretical understanding and computer synthesis of speech prosody. In this paper we show that economical representation of variability is the key to effective modeling of prosody. Specifically, we report the development of PENTAtrainer -A trainable yet deterministic prosody synthesizer based on an articulatory-functional view of speech. We show with testing results on Thai, Mandarin and English that it is possible to achieve high-accuracy predictive synthesis of fundamental frequency contours with very small sets of parameters obtained through stochastic learning from real speech data. The first key component of this system is syllable-synchronized sequential target approximation -implemented as the qTA model, which is designed to simulate, for each tonal unit, a wide range of contextual variability with a single invariant target. The second key component is the automatic learning of function-specific targets through stochastic global optimization, guided by a layered pseudo-hierarchical functional annotation scheme, which requires the manual labeling of only the temporal domains of the functional units. The results in terms of synthesis accuracy demonstrate that effective modeling of the contextual variability is the key also to effective modeling of function-related variability. Additionally, we show that, being both theory-based and trainable (hence datadriven), computational systems like PENTAtrainer can serve as an effective modeling tool in basic research, with which the level of falsifiability in theory testing can be raised, and also a closer link between basic and applied research in speech science can be developed.3 Graphical Abstract Highlights (maximum 85 characters/bullet)• High synthetic accuracy of prosody achieved for Thai, Mandarin and English • Many-to-one mapping from contextually variable surface F 0 to invariant functional targets • Effectively handling of both contextual and non-contextual variability • Combination of deterministic synthesis and data-driven parameter learning • Large-scale and full-detailed prosody synthesis as tool for theory testing • Freely available as a Praat scripts and plug-ins to the speech science community at large

show abstract

Section: Englishmentioning

confidence: 44%

Section: Testing Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Toward invariant functional representations of variable surface fundamental frequency contours: Synthesizing speech melody via model-based stochastic learning

Prom-on

2014

Speech Communication

View full text Add to dashboard Cite

show abstract

“…We add a final traversal of utterance's phonetic representation so that the server can output a series of visemes and animation commands corresponding to a synthesized waveform. For RUTH, we have also reinstrumented Festival (debugging and extending the standard release) to control pitch by annotation; 28,29 we use OGI CSLU synthesis and voices. 30 Animation schedules and speech waveforms output by Festival can be saved, reused and modified directly.…”

Section: Interfacing With Speechmentioning

confidence: 99%

Specifying and animating facial signals for discourse in embodied conversational agents

DeCarlo

Stone²,

Revilla³

et al. 2004

Computer Animation & Virtual

View full text Add to dashboard Cite

People highlight the intended interpretation of their utterances within a larger discourse by a diverse set of non-verbal signals. These signals represent a key challenge for animated conversational agents because they are pervasive, variable, and need to be coordinated judiciously in an effective contribution to conversation. In this paper, we describe a freely available cross-platform real-time facial animation system, RUTH, that animates such highlevel signals in synchrony with speech and lip movements. RUTH adopts an open, layered architecture in which fine-grained features of the animation can be derived by rule from inferred linguistic structure, allowing us to use RUTH, in conjunction with annotation of observed discourse, to investigate the meaningful high-level elements of conversational facial movement for American English speakers.

show abstract

“…In terms of the development method, the models are constructed based on a rule-based approach (Anderson, Pierrehumbert & Liberman, 1984;Allen, Hunnicutt & Klatt, 1987;Jilka, Mohler & Dogil, 1999) or a corpus-based one (Traber, 1992;Manna & Quazza, 1995;Black & Hunt, 1996;Ross & Ostendorf, 1999). The model parameters in the rule-based approach are given by designers through a considerable effort of trial-and-error, while those in the corpus-based one are obtained in a statistical way.…”

Section: Introductionmentioning

confidence: 99%

Tree-based modeling of intonation

Lee

2001

Computer Speech & Language

View full text Add to dashboard Cite

Rules for the generation of ToBI-based American English intonation

Cited by 28 publications

References 20 publications

Toward invariant functional representations of variable surface fundamental frequency contours: Synthesizing speech melody via model-based stochastic learning

Toward invariant functional representations of variable surface fundamental frequency contours: Synthesizing speech melody via model-based stochastic learning

Specifying and animating facial signals for discourse in embodied conversational agents

Tree-based modeling of intonation

Contact Info

Product

Resources

About