Breathy phonation is known as the primary cue of the “voiced” stops in Wu dialects, and is associated with the lower tonal register. This study discusses the phonetic realization of the tonal register of Wu dialects by measuring relative prominence of the first harmonic to some higher-frequency components in the spectrum, F0 and periodicity (CPP) of Jiashan Wu monosyllabic words. We find that in Jiashan Wu, the phonetic targets for tonal register contrasts are a steeper spectral-slope and a lower F0, which is consistent cross all consonant manners, while the articulatory realization varies among different types of consonants.
Voice quality varies at different levels of communication functions. In order to better understand the range of voice quality variation in normal speech, it is important to examine the interaction between global functions and local functions. This study investigates the effect of vocal effort on the contrastive voice quality in Shaoxing Wu. Results show that register contrasts are maintained in all vocal effort conditions, suggesting that the controls for global vs local functions are rather independent. However, the contrastivity of the registers is modulated by the vocal effort conditions, and the register contrasts are less well-defined in the loud and soft conditions.
This study investigates how multiple cues contribute to multi-dimensional phonological contrasts at both the group level and the individual level, and how dialectal experience shapes listeners' perceptual strategies. We examine a tonal register contrast in two Chinese Wu dialects signaled by three cues: pitch height, voice quality, and pitch contour. We found that 1) at the group level, cue weights are context-specific, i.e., vary by tone, and some contrasts rely more heavily on multiple cues than others; 2) dialectal experience affects listeners' perceptual strategy: Shanghai listeners, with their own dialect having a smaller voice quality distinction, do not rely more on the cue even when listening to stimuli with a clear breathy-modal distinction, comparing to Jiashan listeners; 3) individuals' cue weights are correlated in a positive manner, meaning that some listeners show overall larger cue weights than others; larger variability is found when the contrast has more than one salient cue, in which case individuals have different options of choosing one cue over another as the primary cue and this can work against the positive correlation.
Phonological contrasts are usually signaled by multiple cues, and tonal languages typically involve multiple dimensions to distinguish between tones (e.g., duration, pitch contour, and voice quality, etc.). While the topic has been extensively studied, research has mostly used small datasets. This study employs a deep neural network (DNN) based speech recognizer trained on the AISHELL-1 (Bu et al., 2017) speech corpus (178 hours of read speech) to explore the tone space in Mandarin Chinese. A recent study shows that DNN models learn linguistically-interpretable information to distinguish between vowels (Weber et al., 2016). Specifically, from a low-dimensional Bottleneck layer, the model learns features comparable to F1 and F2. In the current study, we propose a more complicated Long Short-Term Memory (LSTM) model—with a Bottleneck layer implemented in the hidden layers—to account for variable duration, an important cue for tone discrimination. By interpreting the features learned in the Bottleneck layer, we explore what acoustic dimensions are involved in distinguishing tones. The large amount of data from the speech corpus also renders the results more convincing and provides additional insights not possible from studies with more limited data sets.
Chinese Wu dialects are known to have two tonal registers, where the lower register is realized with lower pitch and breathy phonation and the upper register is realized with higher pitch and modal phonation. In Jiashan Wu, the falling tone is realized differently in the two registers: the pitch contour of the upper register is slightly steeper than the lower register. This study investigates how speakers of Jiashan Wu weight the three cues (i.e., breathiness, pitch height, pitch contour) in the register contrast. We recorded two words /ka/ from the upper and lower register and created stimuli varying in both dimensions (5 steps pitch height x 5 step breathiness = 25 stimuli) and imposed the two contours on all stimuli. 28 native listeners performed a forced-choice categorization task on 5 repetitions of each stimuli in random order. A mixed effect logistic model shows that all three factors affect categorization, and that pitch contour is the most important cue and breathiness the least. Moreover, the effect of breathiness was smaller with higher pitches and a steep contour, and the effect of pitch height is smaller with a steep contour. Data are being collected comparing Jiashan and Shanghai dialects.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.