This fMRI study explores brain regions involved with perceptual enhancement afforded by observation of visual speech gesture information. Subjects passively identified words presented in the following conditions: audio-only, audiovisual, audio-only with noise, audiovisual with noise, and visual only. The brain may use concordant audio and visual information to enhance perception by integrating the information in a converging multisensory site. Consistent with response properties of multisensory integration sites, enhanced activity in middle and superior temporal gyrus/sulcus was greatest when concordant audiovisual stimuli were presented with acoustic noise. Activity found in brain regions involved with planning and execution of speech production in response to visual speech presented with degraded or absent auditory stimulation, is consistent with the use of an additional pathway through which speech perception is facilitated by a process of internally simulating the intended speech act of the observed speaker.
Speech production research has demonstrated that the first language (L1) often interferes with production in bilinguals’ second language (L2), but it has been suggested that bilinguals who are L2-dominant are the most likely to suppress this L1-interference. While prolonged contextual changes in bilinguals’ language use (e.g., stays overseas) are known to result in L1 and L2 phonetic shifts, code-switching provides the unique opportunity of observing the immediate phonetic effects of L1-L2 interaction. We measured the voice onset times (VOTs) of Greek–English bilinguals’ productions of /b, d, p, t/ in initial and medial contexts, first in either a Greek or English unilingual mode, and in a later session when they produced the same target pseudowords as a code-switch from the opposing language. Compared to a unilingual mode, all English stops produced as code-switches from Greek, regardless of context, had more Greek-like VOTs. In contrast, Greek stops showed no shift toward English VOTs, with the exception of medial voiced stops. Under the specifically interlanguage condition of code-switching we have demonstrated a pervasive influence of the L1 even in L2-dominant individuals.
The way that bilinguals produce phones in each of their languages provides a window into the nature of the bilingual phonological space. For stop consonants, if early sequential bilinguals, whose languages differ in voice onset time (VOT) distinctions, produce native-like VOTs in each of their languages, it would imply that they have developed separate first and second language phones, that is, language-specific phonetic realisations for stop-voicing distinctions. Given the ambiguous phonological status of Greek voiced stops, which has been debated but not investigated experimentally, Greek-English bilinguals can offer a unique perspective on this issue. We first recorded the speech of Greek and Australian-English monolinguals to observe native VOTs in each language for /p, t, b, d/ in word-initial and word-medial (post-vocalic and post-nasal) positions. We then recorded fluent, early Greek–Australian-English bilinguals in either a Greek or English language context; all communication occurred in only one language. The bilinguals in the Greek context were indistinguishable from the Greek monolinguals, whereas the bilinguals in the English context matched the VOTs of the Australian-English monolinguals in initial position, but showed some modest differences from them in the phonetically more complex medial positions. We interpret these results as evidence that bilingual speakers possess phonetic categories for voiced versus voiceless stops that are specific to each language, but are influenced by positional context differently in their second than in their first language.
Perception of speech is improved when presentation of the audio signal is accompanied by concordant visual speech gesture information. This enhancement is most prevalent when the audio signal is degraded. One potential means by which the brain affords perceptual enhancement is thought to be through the integration of concordant information from multiple sensory channels in a common site of convergence, multisensory integration (MSI) sites. Some studies have identified potential sites in the superior temporal gyrus/sulcus (STG/S) that are responsive to multisensory information from the auditory speech signal and visual speech movement. One limitation of these studies is that they do not control for activity resulting from attentional modulation cued by such things as visual information signaling the onsets and offsets of the acoustic speech signal, as well as activity resulting from MSI of properties of the auditory speech signal with aspects of gross visual motion that are not specific to place of articulation information. This fMRI experiment uses spatial wavelet bandpass filtered Japanese sentences presented with background multispeaker audio noise to discern brain activity reflecting MSI induced by auditory and visual correspondence of place of articulation information that controls for activity resulting from the above-mentioned factors. The experiment consists of a low-frequency (LF) filtered condition containing gross visual motion of the lips, jaw, and head without specific place of articulation information, a midfrequency (MF) filtered condition containing place of articulation information, and an unfiltered (UF) condition. Sites of MSI selectively induced by auditory and visual correspondence of place of articulation information were determined by the presence of activity for both the MF and UF conditions relative to the LF condition. Based on these criteria, sites of MSI were found predominantly in the left middle temporal gyrus (MTG), and the left STG/S (including the auditory cortex). By controlling for additional factors that could also induce greater activity resulting from visual motion information, this study identifies potential MSI sites that we believe are involved with improved speech perception intelligibility.
Spatial frequency band-pass and low-pass filtered images of a talker were used in an audiovisual speech-in-noise task. Three experiments tested subjects' use of information contained in the different filter bands with center frequencies ranging from 2.7 to 44.1 cycles/face (c/face). Experiment 1 demonstrated that information from a broad range of spatial frequencies enhanced auditory intelligibility. The frequency bands differed in the degree of enhancement, with a peak being observed in a mid-range band (11-c/face center frequency). Experiment 2 showed that this pattern was not influenced by viewing distance and, thus, that the results are best interpreted in object spatial frequency, rather than in retinal coordinates. Experiment 3 showed that low-pass filtered images could produce a performance equivalent to that produced by unfiltered images. These experiments are consistent with the hypothesis that high spatial resolution information is not necessary for audiovisual speech perception and that a limited range of spatial frequency spectrum is sufficient.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.