Perceptual aftereffects following adaptation to simple stimulus attributes (e.g., motion, color) have been studied for hundreds of years. A striking recent discovery was that adaptation also elicits contrastive aftereffects in visual perception of complex stimuli and faces [1-6]. Here, we show for the first time that adaptation to nonlinguistic information in voices elicits systematic auditory aftereffects. Prior adaptation to male voices causes a voice to be perceived as more female (and vice versa), and these auditory aftereffects were measurable even minutes after adaptation. By contrast, crossmodal adaptation effects were absent, both when male or female first names and when silently articulating male or female faces were used as adaptors. When sinusoidal tones (with frequencies matched to male and female voice fundamental frequencies) were used as adaptors, no aftereffects on voice perception were observed. This excludes explanations for the voice aftereffect in terms of both pitch adaptation and postperceptual adaptation to gender concepts and suggests that contrastive voice-coding mechanisms may routinely influence voice perception. The role of adaptation in calibrating properties of high-level voice representations indicates that adaptation is not confined to vision but is a ubiquitous mechanism in the perception of nonlinguistic social information from both faces and voices.
While audiovisual integration is well known in speech perception, faces and speech are also informative with respect to speaker recognition. To date, audiovisual integration in the recognition of familiar people has never been demonstrated. Here we show systematic benefits and costs for the recognition of familiar voices when these are combined with time-synchronized articulating faces, of corresponding or noncorresponding speaker identity, respectively. While these effects were strong for familiar voices, they were smaller or nonsignificant for unfamiliar voices, suggesting that the effects depend on the previous creation of a multimodal representation of a person's identity. Moreover, the effects were reduced or eliminated when voices were combined with the same faces presented as static pictures, demonstrating that the effects do not simply reflect the use of facial identity as a "cue" for voice recognition. This is the first direct evidence for audiovisual integration in person recognition.
Audiovisual integration (AVI) has been demonstrated to play a major role in speech comprehension. Previous research suggests that AVI in speech comprehension tolerates a temporal window of audiovisual asynchrony. However, few studies have employed audiovisual presentation to investigate AVI in person recognition. Here, participants completed an audiovisual voice familiarity task in which the synchrony of the auditory and visual stimuli was manipulated, and in which visual speaker identity could be corresponding or noncorresponding to the voice. Recognition of personally familiar voices systematically improved when corresponding visual speakers were presented near synchrony or with slight auditory lag. Moreover, when faces of different familiarity were presented with a voice, recognition accuracy suffered at near synchrony to slight auditory lag only. These results provide the first evidence for a temporal window for AVI in person recognition between approximately 100 ms auditory lead and 300 ms auditory lag.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.