Raphaël Thézé scite author profile

When we see our interlocutor, our brain seamlessly extracts visual cues from their face and processes them along with the sound of their voice, making speech an intrinsically multimodal signal. Visual cues are especially important in noisy environments, when the auditory signal is less reliable. Neuronal oscillations might be involved in the cortical processing of audiovisual speech by selecting which sensory channel contributes more to perception. To test this, we designed computer-generated naturalistic audiovisual speech stimuli where one mismatched phoneme-viseme pair in a key word of sentences created bistable perception. Neurophysiological recordings (high-density scalp and intracranial electroencephalography) revealed that the precise phase angle of theta-band oscillations in posterior temporal and occipital cortex of the right hemisphere was crucial to select whether the auditory or the visual speech cue drove perception. We demonstrate that the phase of cortical oscillations acts as an instrument for sensory selection in audiovisual speech processing.

show abstract

Rapid memory stabilization by transient theta coherence in the human medial temporal lobe

Thézé

Guggisberg

Nahum

et al. 2015

Hippocampus

View full text Add to dashboard Cite

Presenting stimuli again after presentation of intervening stimuli improves their retention, an effect known as the spacing effect. However, using event-related potentials (ERPs), we had observed that immediate, in comparison to spaced, repetition of pictures induced a positive frontal potential at 200-300 ms. This potential appeared to emanate from the left medial temporal lobe (MTL), a structure critical for memory consolidation. In this study, we tested the behavioral relevance of this signal and explored functional connectivity changes during picture repetition. We obtained high-density electroencephalographic recordings from 14 healthy subjects performing a continuous recognition task where pictures were either repeated immediately or after 9 intervening items. Conventional ERP analysis replicated the positive frontal potential emanating from the left MTL at 250-350 ms in response to immediately repeated stimuli. Connectivity analysis showed that this ERP was associated with increased coherence in the MTL region-left more that right-in the theta-band (3.5-7 Hz) 200-400 ms following immediate, but not spaced, repetition. This increase was stronger in subjects who better recognized immediately repeated stimuli after 30 min. These findings indicate that transient theta-band synchronization between the MTL and the rest of the brain at 200-400 ms reflects a memory stabilizing signal. V C 2015 Wiley Periodicals, Inc.

show abstract

Animated virtual characters to explore audio-visual speech in controlled and naturalistic environments

Thézé

Gadiri

Albert³

et al. 2020

Sci Rep

View full text Add to dashboard Cite

Natural speech is processed in the brain as a mixture of auditory and visual features. An example of the importance of visual speech is the McGurk effect and related perceptual illusions that result from mismatching auditory and visual syllables. Although the McGurk effect has widely been applied to the exploration of audio-visual speech processing, it relies on isolated syllables, which severely limits the conclusions that can be drawn from the paradigm. In addition, the extreme variability and the quality of the stimuli usually employed prevents comparability across studies. To overcome these limitations, we present an innovative methodology using 3D virtual characters with realistic lip movements synchronized on computer-synthesized speech. We used commercially accessible and affordable tools to facilitate reproducibility and comparability, and the set-up was validated on 24 participants performing a perception task. Within complete and meaningful French sentences, we paired a labiodental fricative viseme (i.e. /v/) with a bilabial occlusive phoneme (i.e. /b/). This audiovisual mismatch is known to induce the illusion of hearing /v/ in a proportion of trials. We tested the rate of the illusion while varying the magnitude of background noise and audiovisual lag. Overall, the effect was observed in 40% of trials. The proportion rose to about 50% with added background noise and up to 66% when controlling for phonetic features. Our results conclusively demonstrate that computer-generated speech stimuli are judicious, and that they can supplement natural speech with higher control over stimulus timing and content.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.