Carolina Sánchez-García scite author profile

Audiovisual speech perception has been frequently studied considering phoneme, syllable and word processing levels. Here, we examined the constraints that visual speech information might exert during the recognition of words embedded in a natural sentence context. We recorded event-related potentials (ERPs) to words that could be either strongly or weakly predictable on the basis of the prior semantic sentential context and, whose initial phoneme varied in the degree of visual saliency from lip movements. When the sentences were presented audio-visually (Experiment 1), words weakly predicted from semantic context elicited a larger long-lasting N400, compared to strongly predictable words. This semantic effect interacted with the degree of visual saliency over a late part of the N400. When comparing audio-visual versus auditory alone presentation (Experiment 2), the typical amplitude-reduction effect over the auditory-evoked N100 response was observed in the audiovisual modality. Interestingly, a specific benefit of high- versus low-visual saliency constraints occurred over the early N100 response and at the late N400 time window, confirming the result of Experiment 1. Taken together, our results indicate that the saliency of visual speech can exert an influence over both auditory processing and word recognition at relatively late stages, and thus suggest strong interactivity between audio-visual integration and other (arguably higher) stages of information processing during natural speech comprehension.

show abstract

Large-scale network-level processes during entrainment

Lithari

Sánchez-García

Ruhnau

et al. 2016

Brain Research

View full text Add to dashboard Cite

Visual rhythmic stimulation evokes a robust power increase exactly at the stimulation frequency, the so-called steady-state response (SSR). Localization of visual SSRs normally shows a very focal modulation of power in visual cortex and led to the treatment and interpretation of SSRs as a local phenomenon. Given the brain network dynamics, we hypothesized that SSRs have additional large-scale effects on the brain functional network that can be revealed by means of graph theory. We used rhythmic visual stimulation at a range of frequencies (4–30 Hz), recorded MEG and investigated source level connectivity across the whole brain. Using graph theoretical measures we observed a frequency-unspecific reduction of global density in the alpha band “disconnecting” visual cortex from the rest of the network. Also, a frequency-specific increase of connectivity between occipital cortex and precuneus was found at the stimulation frequency that exhibited the highest resonance (30 Hz). In conclusion, we showed that SSRs dynamically re-organized the brain functional network. These large-scale effects should be taken into account not only when attempting to explain the nature of SSRs, but also when used in various experimental designs.

show abstract

Cross-Modal Prediction in Speech Perception

et al. 2011

View full text Add to dashboard Cite

Speech perception often benefits from vision of the speaker's lip movements when they are available. One potential mechanism underlying this reported gain in perception arising from audio-visual integration is on-line prediction. In this study we address whether the preceding speech context in a single modality can improve audiovisual processing and whether this improvement is based on on-line information-transfer across sensory modalities. In the experiments presented here, during each trial, a speech fragment (context) presented in a single sensory modality (voice or lips) was immediately continued by an audiovisual target fragment. Participants made speeded judgments about whether voice and lips were in agreement in the target fragment. The leading single sensory context and the subsequent audiovisual target fragment could be continuous in either one modality only, both (context in one modality continues into both modalities in the target fragment) or neither modalities (i.e., discontinuous). The results showed quicker audiovisual matching responses when context was continuous with the target within either the visual or auditory channel (Experiment 1). Critically, prior visual context also provided an advantage when it was cross-modally continuous (with the auditory channel in the target), but auditory to visual cross-modal continuity resulted in no advantage (Experiment 2). This suggests that visual speech information can provide an on-line benefit for processing the upcoming auditory input through the use of predictive mechanisms. We hypothesize that this benefit is expressed at an early level of speech analysis.

show abstract

The Time Course of Audio-Visual Phoneme Identification: a High Temporal Resolution Study

et al. 2018

View full text Add to dashboard Cite

show abstract

Cross-modal prediction in speech depends on prior linguistic experience

2013

View full text Add to dashboard Cite

The sight of a speaker's facial movements during the perception of a spoken message can benefit speech processing through online predictive mechanisms. Recent evidence suggests that these predictive mechanisms can operate across sensory modalities, that is, vision and audition. However, to date, behavioral and electrophysiological demonstrations of cross-modal prediction in speech have considered only the speaker's native language. Here, we address a question of current debate, namely whether the level of representation involved in cross-modal prediction is phonological or pre-phonological. We do this by testing participants in an unfamiliar language. If cross-modal prediction is predominantly based on phonological representations tuned to the phonemic categories of the native language of the listener, then it should be more effective in the listener's native language than in an unfamiliar one. We tested Spanish and English native speakers in an audiovisual matching paradigm that allowed us to evaluate visual-to-auditory prediction, using sentences in the participant's native language and in an unfamiliar language. The benefits of cross-modal prediction were only seen in the native language, regardless of the particular language or participant's linguistic background. This pattern of results implies that cross-modal visual-to-auditory prediction during speech processing makes strong use of phonological representations, rather than low-level spatiotemporal correlations across facial movements and sounds.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.