Linking Speech Perception and Neurophysiology: Speech Decoding Guided by Cascaded Oscillators Locked to the Input Rhythm

Ghitza, Oded

doi:10.3389/fpsyg.2011.00130

Cited by 325 publications

(428 citation statements)

References 52 publications

(97 reference statements)

Supporting

Mentioning

392

Contrasting

Unclassified

Order By: Relevance

“…This reflects the hierarchy of time scales present in the acoustic speech signal; which contains both fast events 20 msec such as the onset and offset of vocalic voicing and the broadband burst after the release of the oral cavity occlusion, and slower~100 msec modulations in the envelope of the speech sound and smooth formant transitions. The online continuous processing presented here opens up the possibility for exploring different temporal scales either nested (Ghitza, 2011) or in parallel.…”

Section: Discussionmentioning

confidence: 99%

Prediction across sensory modalities: A neurocomputational model of the McGurk effect

Olasagasti

Bouton

Giraud

2015

Cortex

View full text Add to dashboard Cite

Audiovisual integrationComputational modeling McGurk effect a b s t r a c tThe McGurk effect is a textbook illustration of the automaticity with which the human brain integrates audio-visual speech. It shows that even incongruent audiovisual (AV) speech stimuli can be combined into percepts that correspond neither to the auditory nor to the visual input, but to a mix of both. Typically, when presented with, e.g., visual /aga/ and acoustic /aba/ we perceive an illusory /ada/. In the inverse situation, however, when acoustic /aga/ is paired with visual /aba/, we perceive a combination of both stimuli, i.e., /abga/ or /agba/. Here we assessed the role of dynamic cross-modal predictions in the outcome of AV speech integration using a computational model that processes continuous audiovisual speech sensory inputs in a predictive coding framework. The model involves three processing levels: sensory units, units that encode the dynamics of stimuli, and multimodal recognition/identity units. The model exhibits a dynamic prediction behavior because evidence about speech tokens can be asynchronous across sensory modality, allowing for updating the activity of the recognition units from one modality while sending topedown predictions to the other modality. We explored the model's response to congruent and incongruent AV stimuli and found that, in the two-dimensional feature space spanned by the speech second formant and lip aperture, fusion stimuli are located in the neighborhood of congruent /ada/, which therefore provides a valid match. Conversely, stimuli that lead to combination percepts do not have a unique valid neighbor. In that case, acoustic and visual cues are both highly salient and generate conflicting predictions in the other modality that cannot be fused, forcing the elaboration of a combinatorial solution.We propose that dynamic predictive mechanisms play a decisive role in the dichotomous perception of incongruent audiovisual inputs.

show abstract

Section: Discussionmentioning

confidence: 99%

Prediction across sensory modalities: A neurocomputational model of the McGurk effect

Olasagasti

Bouton

Giraud

2015

Cortex

View full text Add to dashboard Cite

show abstract

“…One possibility by which entrainment can arise is that slow envelope modulations prominent in many natural sounds directly imprint on periodic excitability changes in cortical networks and effectively provide an intrinsic copy of the slow stimulus dynamics (Howard and Poeppel, 2010;Ding and Simon, 2012;Zion Golumbic et al, 2012). However, it could also well be that entrainment is induced by finer-grained (e.g., spectral) features of acoustic stimuli or higher-order properties of the temporal modulation spectrum, even in the absence of clearly visible envelope modulations (Ghitza, 2011). The causal mechanisms behind the entrainment of cortical oscillations clearly require additional investigation in future studies.…”

Section: Entrainment Of Oscillations To Dynamic Environmentsmentioning

confidence: 96%

“…studies found reduced auditory entrainment in alpha compared with theta oscillations (Luo and Poeppel, 2007;Schroeder et al, 2008;Ding and Simon, 2012;Ng et al, 2012). Furthermore, although alpha signals may shape external attentional control on auditory cortex (Kerlin et al, 2010), the alpha rhythm of the auditory cortex itself does not seem crucial for the temporal hierarchy of oscillations implied in auditory scene analysis, which likely reflects the prominent timescales of natural sounds and speech (Ghitza, 2011). Future work is required to fully elucidate whether the differential importance of theta and alpha signals reflects intrinsic properties of either sensory systems or whether additional attributes of the oscillatory state (e.g., entrained vs spontaneous) shape the impact of theta and alpha phase for stimulus detection.…”

Section: Role Of Oscillatory State For Perceptionmentioning

confidence: 99%

A Precluding But Not Ensuring Role of Entrained Low-Frequency Oscillations for Auditory Perception

2012

View full text Add to dashboard Cite

Oscillatory activity in sensory cortices reflects changes in local excitation-inhibition balance, and recent work suggests that phase signatures of ongoing oscillations predict the perceptual detection of subsequent stimuli. Low-frequency oscillations are also entrained by dynamic natural scenes, suggesting that the chance of detecting a brief target depends on the relative timing of this to the entrained rhythm. We tested this hypothesis in humans by implementing a cocktail-party-like scenario requiring subjects to detect a target embedded in a cacophony of background sounds. Using EEG to measure auditory cortical oscillations, we find that the chance of target detection systematically depends on both power and phase of theta-band (2-6 Hz) but not alpha-band (8 -12 Hz) oscillations before target. Detection rates were higher and responses faster when oscillatory power was low and both detection rate and response speed were modulated by phase. Intriguingly, the phase dependency was stronger for miss than for hit trials, suggesting that phase has a inhibiting but not ensuring role for detection. Entrainment of theta range oscillations prominently occurs during the processing of attended complex stimuli, such as vocalizations and speech. Our results demonstrate that this entrainment to attended sensory environments may have negative effects on the detection of individual tokens within the environment, and they support the notion that specific phase ranges of cortical oscillations act as gatekeepers for perception.

show abstract

“…These observations have been related to the neural encoding of speech by a family of "multi-time resolution models" of speech processing developed in the field of auditory neuroscience (e.g., Poeppel 2003;Hickok and Poeppel 2007;Ghitza and Greenberg 2009;Ghitza 2011). Multi-time resolution models of speech processing suggest that different rates of amplitude modulation in the envelope are encoded by neuronal oscillations at corresponding temporal rates.…”

Section: The Brain: Oscillatory Neuronal Entrainment and Speech Encodingmentioning

confidence: 99%

Speech rhythm and temporal structure: Converging perspectives?

Goswami

Leong

2013

Laboratory Phonology

110

View full text Add to dashboard Cite

Linking Speech Perception and Neurophysiology: Speech Decoding Guided by Cascaded Oscillators Locked to the Input Rhythm

Cited by 325 publications

References 52 publications

Prediction across sensory modalities: A neurocomputational model of the McGurk effect

Prediction across sensory modalities: A neurocomputational model of the McGurk effect

A Precluding But Not Ensuring Role of Entrained Low-Frequency Oscillations for Auditory Perception

Speech rhythm and temporal structure: Converging perspectives?

Contact Info

Product

Resources

About