While the tones of Mandarin are conveyed mainly by the F₀ contour, they also differ consistently in duration and in amplitude contour. The contribution of these factors was examined by using signal-correlated noise stimuli, in which natural speech is manipulated so that it has no F₀ or formant structure but retains its original amplitude contour and duration. Tones 2, 3 and 4 were perceptible from just the amplitude contour, even when duration was not also a cue. In two further experiments, the location of the critical information for the tones during the course of the syllable was examined by extracting small segments from each part of the original syllable. Tones 2 and 3 were often confused with each other, and segments which did not have much F₀ change were most often heard as Tonel. There were, though, also cases in which a low, unchanging pitch was heard as Tone 3, indicating a partial effect of register even in Mandarin. F₀ was positively correlated with amplitude, even when both were computed on a pitch period basis. Taken together, the results show that Mandarin tones are realized in more than just the F₀ pattern, that amplitude contours can be used by listeners as cues for tone identification, and that not every portion of the F₀ pattern unambiguously indicates the original tone.
Some components of a speech signal, when made more intense, are heard simultaneously as speech and nonspeech--a form of duplex perception. At lower intensities, the speech alone is heard. Such intensity-dependent duplexity implies the existence of a phonetic mode of perception that takes precedence over auditory modes.
Fifteen children with autism spectrum disorders (ASD) and twenty-one children without ASD completed six perceptual tasks designed to characterize the nature of the audiovisual processing difficulties experienced by children with ASD. Children with ASD scored significantly lower than children without ASD on audiovisual tasks involving human faces and voices, but scored similarly to children without ASD on audiovisual tasks involving nonhuman stimuli (bouncing balls). Results suggest that children with ASD may use visual information for speech differently from children without ASD. Exploratory results support an inverse association between audiovisual speech processing capacities and social impairment in children with ASD.
This study used eye-tracking methodology to assess audiovisual (AV) speech perception in 26 children ranging in age from 5-15 years, half with autism spectrum disorders (ASD) and half with typical development (TD). Given the characteristic reduction in gaze to the faces of others in children with ASD, it was hypothesized that they would show reduced influence of visual information on heard speech. Responses were compared on a set of auditory, visual and audiovisual speech perception tasks. Even when fixated on the face of the speaker, children with ASD were less visually influenced than TD controls. This indicates fundamental differences in the processing of AV speech in children with ASD, which may contribute to their language and communication impairments.
When an [s] or an [s] fricative noise is combined with vocalic formant transitions appropriate to a different fricative, the resulting consonantal percept is usually that of the noise. To see if the mismatch affects processing time, five experiments were run. Three experiments examined reaction time for identification of [s]and [s], as well as the whole syllable (in one experiment) or only the vowel (in the others). The stimuli contained either appropriate or inappropriate formant transitions, and the vowel information in the noise was either appropriate or not. Subjects were significantly slower in all tasks in identifying stimuli with inappropriate transitions or inappropriate vowel information. Similar results were obtained with stop-vowel syllables in which the release bursts of syllableinitials [pI and [k] were transposed in syllables containing the vowels [a] and [u]. In the fifth experiment, enough silence was introduced between the initial fricatives and vocalic segment for the vocalic formant transitions to be perceived as a stop (e.g., [stu] from [su)). Mismatched transitions still had an effect on reaction time, as did mismatches of vowel quality. The results indicate that listeners take into account all available cues, even when the phonetic judgment seems to be based on only some of the cues. It is well known that information about a phone is temporally spread in the speech signal. It is usually impossible to isolate one piece of the signal and identify it as one single phone. Even when such a segmentation results in a stretch of sound that is identifiable as a single phone, information about neighboring phones usually remains. The vowels of consonantvowel syllables, for example, can be identified at better than chance levels from excised stop-consonant release bursts (Blumstein
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.