When annotating a speech signal using an autosegmental-metrical model of intonation, transcribers associate portions of the F 0 contour with labels from a finite inventory of tonal categories. In the models we are concerned with here, these categories have the status of phonological units (phonological form), bridging the intrinsic variability of the speech signal (substance) with the intrinsic fuzziness of post-lexical function (meaning). This, together with the relatively small size of the label inventory, precludes a one-to-one relationship between form and substance, and/or between form and function. A Neapolitan Italian corpus of read speech is used to investigate the distributional properties of two pitch accents that have been studied extensively with respect to substance (the alignment of F 0 peaks) and meaning (sentence modality). Although there is a general consensus that peaks in this variety are aligned earlier in declaratives than in interrogatives, evidence is provided of contexts in which the converse is true, i.e., in which interrogative peaks are even earlier than their declarative counterparts. In this respect, interrogatives have a richer internal structure than declaratives. We argue that differences in how variably a prosodic category is encoded can be dealt with in an intonation transcription system, as long as this system relates phonological form (the choice of pitch accent in this case) both to phonetic substance and to meaning in a transparent way. 1 This is true of any intonation transcription system, although priorities vary. For instance, within the British school, Crystal prioritizes substance and sees intonation as "the product of the interaction of features from different prosodic systems-tone, pitch-range, loudness, rhythmicality and tempo in particular" (Crystal, 1975, p. 283), whereas Halliday (1967) emphasizes meaning, in accordance with his understanding of intonation as a system within the grammar of English. For recent views on the importance of meaning in intonation transcription, see also Arvaniti (2016) and Cole and Shattuck-Hufnagel (2016).
In this work we propose the use of Functional Data Analysis (FDA) as a powerful methodology to tackle problems where multiple continuous speech parameters have to be analyzed jointly. A production study on contrastive focus placement in Neapolitan Italian is used as illustration. Two features are analyzed, viz. f0 and relative speech rate, both expressed as continuous functions of time. The results show that known facts about the prosody of Neapolitan Italian emerge from the data, but also other interesting local or crossfeature relationships between contour traits appear. Thus, FDA results can be used as guidance in the exploration of speech feature contour shapes, an operation that used to be carried out manually in previous speech research. The capability of jointly analyzing multiple continuous features provides a valuable improvement not only for speech analysis but also for speech re-synthesis.
{ f cangemi , swehr l e2, s t ef an. baumann, mar t i ne. gr i ce} @ uni-koel n. de, di na. el zar
When referring to an object or person, speakers select a referring expression along with an appropriate prosody. This choice is a highly context-dependent, listener-oriented aspect of language that has been reported to be difficult for individuals with Autism Spectrum Disorders (ASD) and associated mentalizing deficits [1, 2]. In a picture-based story-telling task, we investigated the encoding of a referent's givenness, focusing on prosodic choices. When new referents were introduced (or reintroduced) into the discourse, adults with ASD were similar to typically developed adults in their pitch accent placement, but differed in their choice of accent type. On new referents, the ASD group produced accents which are less prominent and which have a non-committal nature (H*), while the control group made greater use of more prominent accents (L+H*, L*+H). Thus, selecting the appropriate pitch accent type to mark a newly introduced referent is problematic for individuals with ASD.
Phonological models of intonation use abstract categories, such as pitch accents, to build a bridge between continuous modulations in F 0 contours (on the substantial side) and post-lexical meaning (on the functional side). However, recent research on Romance, Germanic, and non-Indo-European languages shows that sentence modality contrasts (i.e., question vs. statement) are often realized not only with different F 0 contours, but also through differences in individual phone duration or global speech rate. If these durational differences were also used as a cue in the perception of sentence modality contrasts, phonological categories in current models of intonation would qualify as excessively underspecified, and they should be expanded in order to include phonetic information on the temporal dimension as well. In this paper we evaluate the role of durational differences as a cue to the perception of sentence modality contrasts in the Neapolitan regional variety of Italian. Read sentences were resynthesized by switching durational and intonational patterns of questions and statements, and used in a forced-choice identification task. The results show that listeners exclusively rely on F 0 , thus suggesting that, at least for this specific contrast in this specific variety, phonological representations of intonational contrasts do not need to be enriched with phonetic detail at the durational level.
This paper aims to strengthen the link between acoustic and perceptual representations of intonation, a link that has been weakened by the over-reliance on the F0 trajectory, which can only be interpreted in relation to landmarks in the segmental string, placed manually or semi-automatically at a separate stage in the analysis. Only then can F0 events be identified as linguistically relevant (e.g. early, medial or late peaks, accentual tones or edge tones etc.). We provide an analysis and visualization of two acoustic dimensions contributing towards the perceived pitch contour, F0 over time and, crucially, periodic energy. Periodic energy reflects the degree to which pitch is intelligible, a higher value representing a stronger F0 signal that is consequently more easily perceived. A representation of F0 that includes periodic energy is thus able to flag portions of the speech signal that are relevant for the analysis of intonation, without the need for a separate segmentation of the signal into phones and syllables.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.