The Mother-Infant Phonetic Interaction model (MIPhI) predicts that, compared with adult directed speech (ADS), in infant directed speech (IDS) vowels will be overspecified and consonants underspecified during the infants' first 6 months. In a longitudinal natural study, six mothers' ADS and IDS were recorded on 10 occasions during the first 6 months after their infants were born. Acoustic-phonetic measures, including the first two formant frequencies and duration for vowels and the duration of the fricative /s/, were used to test the MIPhI model with differences between IDS and ADS during the infants' first 6 months. Repeated measures analyses showed the fricative /s/ duration was stably longer in IDS, corresponding to an overspecification throughout the 6 months. The unexpected smaller vowel space for IDS than ADS was stably maintained over the six months, suggesting an underspecification of vowels. Vowel duration, which was generally longer in IDS than ADS, however, changed over time, decreasing in difference between IDS and ADS during month 3 and 4. Results invite adjustments to the MIPhI model, in particular related to infants' needs for perceptual enhancement of speech segments, and to the course of infant neurological and communicative development throughout the first 6 months.
An interactive face-to-face setting is used to study natural infant directed speech (IDS) compared to adult directed speech (ADS). With distinctive vowel quantity and vowel quality, Norwegian IDS was used in a natural quasi-experimental design. Six Norwegian mothers were recorded over a period of 6 months alone with their infants and in an adult conversation. Vowel duration and spectral attributes of the vowels /a:/, /i:/ and /u:/, and their short counterparts /a/ /i/ and /u/ were analysed. Repeated measures analyses show that effects of vowel quantity did not differ between ADS and IDS, and for back vowel qualities, the vowel space was shifted upwards in IDS compared to ADS suggesting that fronted articulations in natural IDS may visually enhance speech to infants.
This study examined the effects of linguistic experience on audio-visual (AV) perception of non-native (L2) speech. Canadian English natives and Mandarin Chinese natives differing in degree of English exposure [long and short length of residence (LOR) in Canada] were presented with English fricatives of three visually distinct places of articulation: interdentals nonexistent in Mandarin and labiodentals and alveolars common in both languages. Stimuli were presented in quiet and in a cafe-noise background in four ways: audio only (A), visual only (V), congruent AV (AVc), and incongruent AV (AVi). Identification results showed that overall performance was better in the AVc than in the A or V condition and better in quiet than in cafe noise. While the Mandarin long LOR group approximated the native English patterns, the short LOR group showed poorer interdental identification, more reliance on visual information, and greater AV-fusion with the AVi materials, indicating the failure of L2 visual speech category formation with the short LOR non-natives and the positive effects of linguistic experience with the long LOR non-natives. These results point to an integrated network in AV speech processing as a function of linguistic background and provide evidence to extend auditory-based L2 speech learning theories to the visual domain.
This study investigated hemispheric lateralization of Mandarin tone. Four groups of listeners were examined: native Mandarin listeners, English–Mandarin bilinguals, Norwegian listeners with experience with Norwegian tone, and American listeners with no tone experience. Tone pairs were dichotically presented and listeners identified which tone they heard in each ear. For the Mandarin listeners, 57% of the total errors occurred in the left ear, indicating a right-ear (left-hemisphere) advantage. The English–Mandarin bilinguals exhibited nativelike patterns, with 56% left-ear errors. However, no ear advantage was found for the Norwegian or American listeners (48 and 47% left-ear errors, respectively). Results indicate left-hemisphere dominance of Mandarin tone by native and proficient bilingual listeners, whereas nonnative listeners show no evidence of lateralization, regardless of their familiarity with lexical tone.
Previous research indicates that perception of audio-visual (AV) synchrony changes in adulthood. Possible explanations for these age differences include a decline in hearing acuity, a decline in cognitive processing speed, and increased experience with AV binding. The current study aims to isolate the effect of AV experience by comparing synchrony judgments from 20 young adults (20 to 30 yrs) and 20 normal-hearing middle-aged adults (50 to 60 yrs), an age range for which a decline of cognitive processing speed is expected to be minimal. When presented with AV stop consonant syllables with asynchronies ranging from 440 ms audio-lead to 440 ms visual-lead, middle-aged adults showed significantly less tolerance for audio-lead than young adults. Middle-aged adults also showed a greater shift in their point of subjective simultaneity than young adults. Natural audio-lead asynchronies are arguably more predictable than natural visual-lead asynchronies, and this predictability may render audio-lead thresholds more prone to experience-related fine-tuning.
In well-controlled laboratory experiments, researchers have found that humans can perceive delays between auditory and visual signals as short as 20 ms. Conversely, other experiments have shown that humans can tolerate audiovisual asynchrony that exceeds 200 ms. This seeming contradiction in human temporal sensitivity can be attributed to a number of factors such as experimental approaches and precedence of the asynchronous signals, along with the nature, duration, location, complexity and repetitiveness of the audiovisual stimuli, and even individual differences. In order to better understand how temporal integration of audiovisual events occurs in the real world, we need to close the gap between the experimental setting and the complex setting of everyday life. With this work, we aimed to contribute one brick to the bridge that will close this gap. We compared perceived synchrony for long-running and eventful audiovisual sequences to shorter sequences that contain a single audiovisual event, for three types of content: action, music, and speech. The resulting windows of temporal integration showed that participants were better at detecting asynchrony for the longer stimuli, possibly because the long-running sequences contain multiple corresponding events that offer audiovisual timing cues. Moreover, the points of subjective simultaneity differ between content types, suggesting that the nature of a visual scene could influence the temporal perception of events. An expected outcome from this type of experiment was the rich variation among participants' distributions and the derived points of subjective simultaneity. Hence, the designs of similar experiments call for more participants than traditional psychophysical studies. Heeding this caution, we conclude that existing theories on multisensory perception are ready to be tested on more natural and representative stimuli.
Although the effects of alcohol on speech production have not been widely investigated, previous research has suggested that utterances produced while a talker is intoxicated may be longer than those produced while the talker is sober [e.g., Sobell et al., Folia Phonetica 34, 316–323 (1982); D. B. Pisoni and C. S. Martin, Alcoholism: Clinical Exp. Res. 13, 577–587 (1989)]. As part of a larger investigation of the effects of alcohol on speech, nine talkers were recorded while sober and intoxicated. Talkers produced isolated monosyllabic words, isolated spondees, isolated sentences, and passages of fluent speech. Two questions of utterance duration were addressed: (1) Does alcohol affect the duration of utterances? (2) Does alcohol affect the duration of different utterance types in the same way? The results revealed that isolated sentences and sentences from within passages produced in the intoxicated condition were reliably longer than those produced in the sober condition. However, for isolated monosyllabic words and spondees, utterance durations were not reliably different between the sober and intoxicated conditions. Results are discussed in terms of the effects of alcohol on speech motor control.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.