Predictive coding: the idea that the brain generates hypotheses about the possible causes of forthcoming sensory events and that these hypotheses are compared with incoming sensory information. The difference between topdown expectation and incoming sensory inputs, that is, prediction error, is propagated forward throughout the cortical hierarchy. Predictive timing (temporal expectations): an extension of the notion of predictive coding to the exploitation of temporal regularities (such as a beat) or associative contingencies (for instance, temporal relation between two inputs) to infer the occurrence of future sensory events. Top-down processing: efferent neural operations that convey the internal goals or states of the observer. This notion generally includes different cognitive processes, such as attention and expectations (Box 1). Neural oscillations: neurophysiological electromagnetic signals [from LocalField Potentials (LFP), electroencephalographic (EEG) and magnetoencephalographic recordings (MEG)] that reflect coherent neuronal population behavior at different spatial scales. These signals have been labeled as a function of their frequency from human surface EEG: delta (2-4 Hz), theta (4-8 Hz), alpha (8)(9)(10)(11)(12), and gamma bands (30-100 Hz). The mechanistic properties of oscillations are computationally interesting as a means of explaining various aspects of perception and cognition, for example, longdistance communication across brain regions, unification of various attributes of the same object, segmentation of the sensory input, memory etc.
A growing body of research suggests that intrinsic neuronal slow (< 10 Hz) oscillations in auditory cortex appear to track incoming speech and other spectro-temporally complex auditory signals. Within this framework, several recent studies have identified critical-band temporal envelopes as the specific acoustic feature being reflected by the phase of these oscillations. However, how this alignment between speech acoustics and neural oscillations might underpin intelligibility is unclear. Here we test the hypothesis that the ‘sharpness’ of temporal fluctuations in the critical band envelope acts as a temporal cue to speech syllabic rate, driving delta-theta rhythms to track the stimulus and facilitate intelligibility. We interpret our findings as evidence that sharp events in the stimulus cause cortical rhythms to re-align and parse the stimulus into syllable-sized chunks for further decoding. Using magnetoencephalographic recordings, we show that by removing temporal fluctuations that occur at the syllabic rate, envelope-tracking activity is reduced. By artificially reinstating these temporal fluctuations, envelope-tracking activity is regained. These changes in tracking correlate with intelligibility of the stimulus. Together, the results suggest that the sharpness of fluctuations in the stimulus, as reflected in the cochlear output, drive oscillatory activity to track and entrain to the stimulus, at its syllabic rate. This process likely facilitates parsing of the stimulus into meaningful chunks appropriate for subsequent decoding, enhancing perception and intelligibility.
According to the predictive coding theory, top-down predictions are conveyed by backward connections and prediction errors are propagated forward across the cortical hierarchy. Using MEG in humans, we show that violating multisensory predictions causes a fundamental and qualitative change in both the frequency and spatial distribution of cortical activity. When visual speech input correctly predicted auditory speech signals, a slow delta regime (3-4 Hz) developed in higher-order speech areas. In contrast, when auditory signals invalidated predictions inferred from vision, a low-beta (14-15 Hz) / high-gamma (60-80 Hz) coupling regime appeared locally in a multisensory area (area STS). This frequency shift in oscillatory responses scaled with the degree of audio-visual congruence and was accompanied by increased gamma activity in lower sensory regions. These findings are consistent with the notion that bottom-up prediction errors are communicated in predominantly high (gamma) frequency ranges, whereas top-down predictions are mediated by slower (beta) frequencies.© 2011 Nature America, Inc. All rights reserved.7 9 8 VOLUME 14 | NUMBER 6 | JUNE 2011 nature neurOSCIenCe a r t I C l e S converge-that is, where multisensory predictions are generatedwhereas gamma activity was seen in lower sensory cortices where prediction errors emerge and are propagated forward. RESULTSWe presented 15 subjects with stimuli in one of three conditions: videos (audio-visual: AV condition) of a speaker pronouncing the syllables /pa/, / a/, /la/, /ta/, /ga/ and /fa/ (International Phonetic Alphabet notation); an auditory track of these videos combined with a still face (auditory: A condition); or a mute visual track (visual: V condition). The videos could be either natural or a random combination of auditory and visual tracks, creating conditions in which auditory and visual tracks were congruent (AVc condition) and ones in which they were incongruent (AVi condition; see Online Methods and Supplementary Fig. 1). Incongruent combinations yielding fusion illusory percepts-that is, McGurk stimuli-were excluded 8,11 . Subjects performed an unrelated target detection task on the syllable /fa/ that was presented in A, V or AVc form in 13% of the trials (97% correct detection). These trials were subsequently excluded from the analyses. The five other syllables were chosen because they yielded graded recognition accuracy when presented visually (Fig. 1a), resulting from an increasing predictiveness 10 . The phonological prediction conveyed by mouth movements (visemes) varies in specificity depending on the pronounced syllable. Typically, syllables beginning with a consonant that is formed at the front of the mouth (/p/, /m/) convey a more specific prediction than those formed at the back (/g/, /k/, /r/, /l/) 8 . Our second experimental factor pertained to the validity of the visual prediction with respect to the auditory input. Physically, the audio-visual stimuli could be either congruent (valid prediction) or incongruent (invalid prediction), w...
Viewing our interlocutor facilitates speech perception, unlike for instance when we telephone. Several neural routes and mechanisms could account for this phenomenon. Using magnetoencephalography, we show that when seeing the interlocutor, latencies of auditory responses (M100) are the shorter the more predictable speech is from visual input, whether the auditory signal was congruent or not. Incongruence of auditory and visual input affected auditory responses ϳ20 ms after latency shortening was detected, indicating that initial content-dependent auditory facilitation by vision is followed by a feedback signal that reflects the error between expected and received auditory input (prediction error). We then used functional magnetic resonance imaging and confirmed that distinct routes of visual information to auditory processing underlie these two functional mechanisms. Functional connectivity between visual motion and auditory areas depended on the degree of visual predictability, whereas connectivity between the superior temporal sulcus and both auditory and visual motion areas was driven by audiovisual (AV) incongruence. These results establish two distinct mechanisms by which the brain uses potentially predictive visual information to improve auditory perception. A fast direct corticocortical pathway conveys visual motion parameters to auditory cortex, and a slower and indirect feedback pathway signals the error between visual prediction and auditory input.
Screaming is arguably one of the most relevant communication signals for survival in humans. Despite their practical relevance and their theoretical significance as innate [1] and virtually universal [2, 3] vocalizations, what makes screams a unique signal and how they are processed is not known. Here, we use acoustic analyses, psychophysical experiments, and neuroimaging to isolate those features that confer to screams their alarming nature, and we track their processing in the human brain. Using the modulation power spectrum (MPS, [4, 5]), a recently developed neurally-informed characterization of sounds, we demonstrate that human screams cluster within restricted portion of the acoustic space (between ∼30–150 Hz modulation rates) that corresponds to a well-known perceptual attribute, roughness. In contrast to the received view that roughness is irrelevant for communication [6], our data reveal that the acoustic space occupied by the rough vocal regime is segregated from other signals, including speech, a pre-requisite to avoid false-alarms in normal vocal communication. We show that roughness is present in natural alarm signals as well as in artificial alarms, and that the presence of roughness in sounds boosts their detection in various tasks. Using fMRI, we show that acoustic roughness engages subcortical structures critical to rapidly appraise danger. Altogether, these data demonstrate that screams occupy a privileged acoustic niche that, being separated from other communication signals, ensures their biological and ultimately social efficiency.
The ability to generate temporal predictions is fundamental for adaptive behavior. Precise timing at the time-scale of seconds is critical, for instance to predict trajectories or to select relevant information. What mechanisms form the basis for such accurate timing? Recent evidence suggests that (1) temporal predictions adjust sensory selection by controlling neural oscillations in time and (2) the motor system plays an active role in inferring "when" events will happen. We hypothesized that oscillations in the delta and beta bands are instrumental in predicting the occurrence of auditory targets. Participants listened to brief rhythmic tone sequences and detected target delays while undergoing magnetoencephalography recording. Prior to target occurrence, we found that coupled delta (1-3 Hz) and beta (18-22 Hz) oscillations temporally align with upcoming targets and bias decisions towards correct responses, suggesting that delta-beta coupled oscillations underpin prediction accuracy. Subsequent to target occurrence, subjects update their decisions using the magnitude of the alpha-band (10-14 Hz) response as internal evidence of target timing. These data support a model in which the orchestration of oscillatory dynamics between sensory and motor systems is exploited to accurately select sensory information in time.
Predicting not only what will happen, but also when it will happen is extremely helpful for optimizing perception and action. Temporal predictions driven by periodic stimulation increase perceptual sensitivity and reduce response latencies. At the neurophysiological level, a single mechanism has been proposed to mediate this twofold behavioral improvement: the rhythmic entrainment of slow cortical oscillations to the stimulation rate. However, temporal regularities can occur in aperiodic contexts, suggesting that temporal predictions per se may be dissociable from entrainment to periodic sensory streams. We investigated this possibility in two behavioral experiments, asking human participants to detect near-threshold auditory tones embedded in streams whose temporal and spectral properties were manipulated. While our findings confirm that periodic stimulation reduces response latencies, in agreement with the hypothesis of a stimulus-driven entrainment of neural excitability, they further reveal that this motor facilitation can be dissociated from the enhancement of auditory sensitivity. Perceptual sensitivity improvement is unaffected by the nature of temporal regularities (periodic vs aperiodic), but contingent on the co-occurrence of a fulfilled spectral prediction. Altogether, the dissociation between predictability and periodicity demonstrates that distinct mechanisms flexibly and synergistically operate to facilitate perception and action.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.