The data provide fMRI evidence of crossmodal binding by convergence in the human heteromodal cortex. They further suggest that response enhancement and depression may be a general property of multisensory integration operating at different levels of the neuroaxis and irrespective of the purpose for which sensory inputs are combined.
Modern brain imaging techniques have now made it possible to study the neural sites and mechanisms underlying crossmodal processing in the human brain. This paper reviews positron emission tomography, functional magnetic resonance imaging (fMRI), event-related potential and magnetoencephalographic studies of crossmodal matching, the crossmodal integration of content and spatial information, and crossmodal learning. These investigations are beginning to produce some consistent findings regarding the neuronal networks involved in these distinct crossmodal operations. Increasingly, specific roles are being defined for the superior temporal sulcus, the inferior parietal sulcus, regions of frontal cortex, the insula cortex and claustrum. The precise network of brain areas implicated in any one study, however, seems to be heavily dependent on the experimental paradigms used, the nature of the information being combined and the particular combination of modalities under investigation. The different analytic strategies adopted by different groups may also be a significant factor contributing to the variability in findings. In this paper, we demonstrate the impact of computing intersections, conjunctions and interaction effects on the identification of audiovisual integration sites using existing fMRI data from our own laboratory. This exercise highlights the potential value of using statistical interaction effects to model electrophysiological responses to crossmodal stimuli in order to identify possible sites of multisensory integration in the human brain.
Watching a speaker's lips during face-to-face conversation (lipreading) markedly improves speech perception, particularly in noisy conditions. With functional magnetic resonance imaging it was found that these linguistic visual cues are sufficient to activate auditory cortex in normal hearing individuals in the absence of auditory speech sounds. Two further experiments suggest that these auditory cortical areas are not engaged when an individual is viewing nonlinguistic facial movements but appear to be activated by silent meaningless speechlike movements (pseudospeech). This supports psycholinguistic evidence that seen speech influences the perception of heard speech at a prelexical stage.
Speech is perceived both by ear and by eye. Unlike heard speech, some seen speech gestures can be captured in stilled image sequences. Previous studies have shown that in hearing people, natural time-varying silent seen speech can access the auditory cortex (left superior temporal regions). Using functional magnetic resonance imaging (fMRI), the present study explored the extent to which this circuitry was activated when seen speech was deprived of its time-varying characteristics. In the scanner, hearing participants were instructed to look for a prespecified visible speech target sequence ("voo" or "ahv") among other monosyllables. In one condition, the image sequence comprised a series of stilled key frames showing apical gestures (e.g., separate frames for "v" and "oo" [from the target] or "ee" and "m" [i.e., from nontarget syllables]). In the other condition, natural speech movement of the same overall segment duration was seen. In contrast to a baseline condition in which the letter "V" was superimposed on a resting face, stilled speech face images generated activation in posterior cortical regions associated with the perception of biological movement, despite the lack of apparent movement in the speech image sequence. Activation was also detected in traditional speech-processing regions including the left inferior frontal (Broca's) area, left superior temporal sulcus (STS), and left supramarginal gyrus (the dorsal aspect of Wernicke's area). Stilled speech sequences also generated activation in the ventral premotor cortex and anterior inferior parietal sulcus bilaterally. Moving faces generated significantly greater cortical activation than stilled face sequences, and in similar regions. However, a number of differences between stilled and moving speech were also observed. In the visual cortex, stilled faces generated relatively more activation in primary visual regions (V1/V2), while visual movement areas (V5/MT+) were activated to a greater extent by moving faces. Cortical regions activated more by naturally moving speaking faces included the auditory cortex (Brodmann's Areas 41/42; lateral parts of Heschl's gyrus) and the left STS and inferior frontal gyrus. Seen speech with normal time-varying characteristics appears to have preferential access to "purely" auditory processing regions specialized for language, possibly via acquired dynamic audiovisual integration mechanisms in STS. When seen speech lacks natural time-varying characteristics, access to speech-processing systems in the left temporal lobe may be achieved predominantly via action-based speech representations, realized in the ventral premotor cortex.
Integrating information across the senses can enhance our ability to detect and classify stimuli in the environment. For example, auditory speech perception is substantially improved when the speaker's face is visible. In an fMRI study designed to investigate the neural mechanisms underlying these crossmodal behavioural gains, bimodal (audio-visual) speech was contrasted against both unimodal (auditory and visual) components. Significant response enhancements in auditory (BA 41/42) and visual (V5) cortices were detected during bimodal stimulation. This effect was found to be specific to semantically congruent crossmodal inputs. These data suggest that the perceptual improvements effected by synthesizing matched multisensory inputs are realised by reciprocal amplification of the signal intensity in participating unimodal cortices.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.