In this paper, we provide evidence for rhythmic classifications of speech from duration measurements. Our investigation differs from previous studies in two ways. Firstly, we do not relate speech rhythm to phonological units such as interstress intervals or syllable durations. Instead, we calculate durational variability in successive acoustic-phonetic intervals using Pairwise Variability Indices. Secondly, we compare measurements from languages traditionally classified as stress-, syllable-or mora-timed with measurements from hitherto unclassified languages. The values obtained agree with the classification of English, Dutch and German as stress-timed and French and Spanish as syllable-timed: durational variability is greater in stress-timed languages than in syllable-timed languages. Values from Japanese, a mora-timed language, are similar to those from syllable-timed languages. But previously unclassified languages do not fit into any of the three classes. Instead, their values overlap with the margins of the stress-timed and the syllable-timed group. Low & Grabe 2/16 Durational variability and speech rhythm and syllable-timed rhythm: (i) there is considerable variation in syllable length in a language spoken with stress-timed rhythm, whereas in a language spoken with syllable-timed rhythm, syllables tend to be equal in length, and (ii) in syllable-timed languages, interstress intervals are unevenly spaced. Roach's findings did not support either claim. The syllable-timed languages in his sample exhibited greater variability in syllable durations than the stress-timed languages. Roach also observed a wider range of percent deviations in interstress intervals in stress-timed than in syllable-timed languages. Roach concluded that measurements of time intervals in speech could not provide evidence for rhythm classes. Roach's view has been supported by Dauer's (1983) study. Dauer compared interstress intervals in English, Thai, Spanish, Italian and Greek. She found that interstress intervals were no more regular in English, a stress-timed language, than in Spanish, a syllable-timed language. Dauer concluded that the search for acoustic phonetic correlates of stress-and syllable-timing was futile. Isochrony in mora-timing was investigated by Han (1962) Port, Al-Ani and Maeda (1980), and Port, Dalby and O'Dell (1987). Port et al. (1987) argue that these studies provide some preliminary support for the mora as a constant time unit. But other researchers have questioned the acoustic basis for moratiming (Oyakawa, 1971, Beckman, 1982, Hoequist, 1983a,b). Beckman (1982)'s data, for instance, did not show that segments vary in length in Japanese in order to compensate for intrinsic durations of adjacent segments so that morae are equal in length. In short, although popular among linguists, the rhythm class hypothesis has been contradicted by numerous empirical studies. Abercrombie's view of speech rhythm as a combination of chest and stress-pulses has long been disproven (e.g. Ladefoged, 1967), destroying the physiologic...
We explored a database covering seven dialects of British and Irish English and three different styles of speech to find acoustic correlates of prominence. We built classifiers, trained the classifiers on human prominence/nonprominence judgments, and then evaluated how well they behaved. The classifiers operate on 452 ms windows centered on syllables, using different acoustic measures. By comparing the performance of classifiers based on different measures, we can learn how prominence is expressed in speech. Contrary to textbooks and common assumption, fundamental frequency (f0) played a minor role in distinguishing prominent syllables from the rest of the utterance. Instead, speakers primarily marked prominence with patterns of loudness and duration. Two other acoustic measures that we examined also played a minor role, comparable to f0. All dialects and speaking styles studied here share a common definition of prominence. The result is robust to differences in labeling practice and the dialect of the labeler.
The mathematical models of intonation used in speech technology are often inaccessible to linguists. By the same token, phonological descriptions of intonation are rarely used by speech technologists, as they cannot be implemented directly in applications. Consequently, these research communities do not benefit much from each other's insights. In this paper, we explore the interface between the disciplines, in search of bridges between intonational phonology and speech technology. In a corpus of speech data from seven dialects of English, we hand-labeled over 700 sentences and identified seven nuclear accent types. Then we fitted a third-order polynomial to the fundamental frequency (F0) contour in the region around the accent mark. The polynomial captures the local shape (time-dependence) of F0 in a few numbers, in our case, four coefficients. The coefficients were subjected to statistical analysis. Nineteen of the 21 pairs of accent types differed significantly in one or more coefficients. Our approach bridges the gap between intonational phonology and speech technology. It provides quantitative, empirically testable models of intonation labels that can be implemented in applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.