Synchronous presentation of stimuli to the auditory and visual systems can modify the formation of a percept in either modality. For example, perception of auditory speech is improved when the speaker's facial articulatory movements are visible. Neural convergence onto multisensory sites exhibiting supra-additivity has been proposed as the principal mechanism for integration. Recent findings, however, have suggested that putative sensory-specific cortices are responsive to inputs presented through a different modality. Consequently, when and where audiovisual representations emerge remain unsettled. In combined psychophysical and electroencephalography experiments we show that visual speech speeds up the cortical processing of auditory signals early (within 100 ms of signal onset). The auditory-visual interaction is reflected as an articulator-specific temporal facilitation (as well as a nonspecific amplitude reduction). The latency facilitation systematically depends on the degree to which the visual signal predicts possible auditory targets. The observed auditory-visual data support the view that there exist abstract internal representations that constrain the analysis of subsequent speech inputs. This is evidence for the existence of an ''analysis-by-synthesis'' mechanism in auditory-visual speech perception. combinations'' such as ''pk '' or ''kp '' but never a fused percept. These results illustrate the effect of input modality on the perceptual AV speech outcome and suggest that multisensory percept formation is systematically based on the informational content of the inputs. In classic speech theories, however, visual speech has seldom been accounted for as a natural source of speech input. Ultimately, when in the processing stream (i.e., at which representational stage) sensory-specific information fuses to yield unified percepts is fundamental for any theoretical, computational, and neuroscientific accounts of speech perception.Recent investigations of AV speech are based on hemodynamic studies that cannot speak directly to timing issues (2, 3). Electroencephalographic (EEG) and magnetoencephalographic (4-7) studies testing AV speech integration have typically used oddball or mismatch negativity paradigms, thus the earliest AV speech interactions have been reported for the 150-to 250-ms mismatch response. Whether systematic AV speech interactions can be documented earlier is controversial, although nonspeech effects can be observed early (8). AV Speech as a Multisensory ProblemSeveral properties of speech are relevant to the present study. (i) Because AV speech is ecologically valid for humans (9, 10), one might predict an involvement of specialized neural computations capable of handling the spectrotemporal complexity of AV speech (compared to, say, arbitrary tone-flash pairings), for which no natural functional relevance can be assumed. (ii) Natural AV speech is characterized by particular dynamics such as (a) the temporal precedence of visual speech (the movement of the facial articulators typically ...
According to hierarchical predictive coding models, the cortex constantly generates predictions of incoming stimuli at multiple levels of processing. Responses to auditory mismatches and omissions are interpreted as reflecting the prediction error when these predictions are violated. An alternative interpretation, however, is that neurons passively adapt to repeated stimuli. We separated these alternative interpretations by designing a hierarchical auditory novelty paradigm and recording human EEG and magnetoencephalographic (MEG) responses to mismatching or omitted stimuli. In the crucial condition, participants listened to frequent series of four identical tones followed by a fifth different tone, which generates a mismatch response. Because this response itself is frequent and expected, the hierarchical predictive coding hypothesis suggests that it should be cancelled out by a higher-order prediction. Three consequences ensue. First, the mismatch response should be larger when it is unexpected than when it is expected. Second, a perfectly monotonic sequence of five identical tones should now elicit a higher-order novelty response. Third, omitting the fifth tone should reveal the brain's hierarchical predictions. The rationale here is that, when a deviant tone is expected, its omission represents a violation of two expectations: a local prediction of a tone plus a hierarchically higher expectation of its deviancy. Thus, such an omission should induce a greater prediction error than when a standard tone is expected. Simultaneous EEE-magnetoencephalographic recordings verify those predictions and thus strongly support the predictive coding hypothesis. Higher-order predictions appear to be generated in multiple areas of frontal and associative cortices. mismatch negativity | P300 component
Magnetoencephalographic (MEG) recordings are a rich source of information about the neural dynamics underlying cognitive processes in the brain, with excellent temporal and good spatial resolution. In recent years there have been considerable advances in MEG hardware developments and methods. Sophisticated analysis techniques are now routinely applied and continuously improved, leading to fascinating insights into the intricate dynamics of neural processes. However, the rapidly increasing level of complexity of the different steps in a MEG study make it difficult for novices, and sometimes even for experts, to stay aware of possible limitations and caveats. Furthermore, the complexity of MEG data acquisition and data analysis requires special attention when describing MEG studies in publications, in order to facilitate interpretation and reproduction of the results. This manuscript aims at making recommendations for a number of important data acquisition and data analysis steps and suggests details that should be specified in manuscripts reporting MEG studies. These recommendations will hopefully serve as guidelines that help to strengthen the position of the MEG research community within the field of neuroscience, and may foster discussion in order to further enhance the quality and impact of MEG research.
Observing a speaker's mouth profoundly influences speech perception. For example, listeners perceive an "illusory" "ta" when the video of a face producing /ka/ is dubbed onto an audio /pa/. Here, we show how cortical areas supporting speech production mediate this illusory percept and audiovisual (AV) speech perception more generally. Specifically, cortical activity during AV speech perception occurs in many of the same areas that are active during speech production. We find that different perceptions of the same syllable and the perception of different syllables are associated with different distributions of activity in frontal motor areas involved in speech production. Activity patterns in these frontal motor areas resulting from the illusory "ta" percept are more similar to the activity patterns evoked by AV(/ta/) than they are to patterns evoked by AV(/pa/) or AV(/ka/). In contrast to the activity in frontal motor areas, stimulus-evoked activity for the illusory "ta" in auditory and somatosensory areas and visual areas initially resembles activity evoked by AV(/pa/) and AV(/ka/), respectively. Ultimately, though, activity in these regions comes to resemble activity evoked by AV(/ta/). Together, these results suggest that AV speech elicits in the listener a motor plan for the production of the phoneme that the speaker might have been attempting to produce, and that feedback in the form of efference copy from the motor system ultimately influences the phonetic interpretation.
Speech perception consists of a set of computations that take continuously varying acoustic waveforms as input and generate discrete representations that make contact with the lexical representations stored in long-term memory as output. Because the perceptual objects that are recognized by the speech perception enter into subsequent linguistic computation, the format that is used for lexical representation and processing fundamentally constrains the speech perceptual processes. Consequently, theories of speech perception must, at some level, be tightly linked to theories of lexical representation. Minimally, speech perception must yield representations that smoothly and rapidly interface with stored lexical items. Adopting the perspective of Marr, we argue and provide neurobiological and psychophysical evidence for the following research programme. First, at the implementational level, speech perception is a multi-time resolution process, with perceptual analyses occurring concurrently on at least two time scales (approx. 20-80 ms, approx. 150-300 ms), commensurate with (sub)segmental and syllabic analyses, respectively. Second, at the algorithmic level, we suggest that perception proceeds on the basis of internal forward models, or uses an 'analysis-by-synthesis' approach. Third, at the computational level (in the sense of Marr), the theory of lexical representation that we adopt is principally informed by phonological research and assumes that words are represented in the mental lexicon in terms of sequences of discrete segments composed of distinctive features. One important goal of the research programme is to develop linking hypotheses between putative neurobiological primitives (e.g. temporal primitives) and those primitives derived from linguistic inquiry, to arrive ultimately at a biologically sensible and theoretically satisfying model of representation and computation in speech.
BackgroundThe ability to estimate the passage of time is of fundamental importance for perceptual and cognitive processes. One experience of time is the perception of duration, which is not isomorphic to physical duration and can be distorted by a number of factors. Yet, the critical features generating these perceptual shifts in subjective duration are not understood.Methodology/FindingsWe used prospective duration judgments within and across sensory modalities to examine the effect of stimulus predictability and feature change on the perception of duration. First, we found robust distortions of perceived duration in auditory, visual and auditory-visual presentations despite the predictability of the feature changes in the stimuli. For example, a looming disc embedded in a series of steady discs led to time dilation, whereas a steady disc embedded in a series of looming discs led to time compression. Second, we addressed whether visual (auditory) inputs could alter the perception of duration of auditory (visual) inputs. When participants were presented with incongruent audio-visual stimuli, the perceived duration of auditory events could be shortened or lengthened by the presence of conflicting visual information; however, the perceived duration of visual events was seldom distorted by the presence of auditory information and was never perceived shorter than their actual durations.Conclusions/SignificanceThese results support the existence of multisensory interactions in the perception of duration and, importantly, suggest that vision can modify auditory temporal perception in a pure timing task. Insofar as distortions in subjective duration can neither be accounted for by the unpredictability of an auditory, visual or auditory-visual event, we propose that it is the intrinsic features of the stimulus that critically affect subjective time distortions.
When presented with an auditory sequence, the brain acts as a predictive-coding device that extracts regularities in the transition probabilities between sounds and detects unexpected deviations from these regularities. Does such prediction require conscious vigilance, or does it continue to unfold automatically in the sleeping brain? The mismatch negativity and P300 components of the auditory event-related potential, reflecting two steps of auditory novelty detection, have been inconsistently observed in the various sleep stages. To clarify whether these steps remain during sleep, we recorded simultaneous electroencephalographic and magnetoencephalographic signals during wakefulness and during sleep in normal subjects listening to a hierarchical auditory paradigm including short-term (local) and long-term (global) regularities. The global response, reflected in the P300, vanished during sleep, in line with the hypothesis that it is a correlate of high-level conscious error detection. The local mismatch response remained across all sleep stages (N1, N2, and REM sleep), but with an incomplete structure; compared with wakefulness, a specific peak reflecting prediction error vanished during sleep. Those results indicate that sleep leaves initial auditory processing and passive sensory response adaptation intact, but specifically disrupts both short-term and long-term auditory predictive coding. mismatch response | prediction | magnetoencephalography | MMN | P300
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.