Speech through ears and eyes: interfacing the senses with the supramodal brain

Wassenhove, Virginie van

doi:10.3389/fpsyg.2013.00388

Cited by 68 publications

(59 citation statements)

References 188 publications

(267 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Whereas alpha-band oscillations may be involved in speech analysis at the vowel level, the same mechanism may apply to other time-scales of analysis, such as theta band for syllables, (Luo and Poeppel, 2007) depending on the input and task. There is indeed evidence that different combinations of oscillatory frequencies can be entrained, depending on the context (Kösem and van Wassenhove, 2012; Schroeder et al, 2008; van Wassenhove, 2013). Perhaps the most intriguing example, albeit still speculative, is that of audiovisual speech [reviewed by (Giraud and Poeppel, 2012; Schroeder et al, 2008)].…”

Section: How Do Behavioral Goals Guide the Flexible Use Of Canonicmentioning

confidence: 99%

Multisensory Integration: Flexible Use of General Operations

Atteveldt

Murray

Thut

et al. 2014

Neuron

248

194

View full text Add to dashboard Cite

Research into the anatomical substrates and “principles” for integrating inputs from separate sensory surfaces has yielded divergent findings. This suggests that multisensory integration is flexible and context-dependent, and underlines the need for dynamically adaptive neuronal integration mechanisms. We propose that flexible multisensory integration can be explained by a combination of canonical, population-level integrative operations, such as oscillatory phase-resetting and divisive normalization. These canonical operations subsume multisensory integration into a fundamental set of principles as to how the brain integrates all sorts of information, and they are being used proactively and adaptively. We illustrate this proposition by unifying recent findings from different research themes such as timing, behavioral goal and experience-related differences in integration.

show abstract

Section: How Do Behavioral Goals Guide the Flexible Use Of Canonicmentioning

confidence: 99%

Multisensory Integration: Flexible Use of General Operations

Atteveldt

Murray

Thut

et al. 2014

Neuron

248

194

View full text Add to dashboard Cite

show abstract

“…Recent predictive coding models of perception suggest that rather than passively categorizing the bottom-up signal, observers make active predictions about what they are likely to hear (and see), and that perception is based on the difference between these predictions and the bottom-up signal (Clark, 2013; Friston, 2005; Kumar et al, 2011; Rao & Ballard, 1999; see McMurray & Jongman, 2011 and Kleinschmidt & Jaeger, for applications to speech perception). Visual speech information could play a crucial role in such predictive processes (Arnal & Giraud, 2012; van Wassenhove, 2013) because in many cases, preparatory gestures (e.g., closing the lips before a word initial /b/, raising the tongue before a /d/) are visible before any acoustic signal is produced (Chandrasekaran, Trubanova, Stillittano, Caplier, & Ghazanfar, 2009; Schwartz & Savariaux, 2014). Thus, for the listener, the visual speech signal could set up predictions about what is about to be heard.…”

Section: Introductionmentioning

confidence: 99%

Can you hear me yet? An intracranial investigation of speech and non-speech audiovisual interactions in human cortex

Rhone

Nourski

Oya

et al. 2015

Language, Cognition and Neuroscience

View full text Add to dashboard Cite

In everyday conversation, viewing a talker's face can provide information about the timing and content of an upcoming speech signal, resulting in improved intelligibility. Using electrocorticography, we tested whether human auditory cortex in Heschl's gyrus (HG) and on superior temporal gyrus (STG) and motor cortex on precentral gyrus (PreC) were responsive to visual/gestural information prior to the onset of sound and whether early stages of auditory processing were sensitive to the visual content (speech syllable versus non-speech motion). Event-related band power (ERBP) in the high gamma band was content-specific prior to acoustic onset on STG and PreC, and ERBP in the beta band differed in all three areas. Following sound onset, we found with no evidence for content-specificity in HG, evidence for visual specificity in PreC, and specificity for both modalities in STG. These results support models of audio-visual processing in which sensory information is integrated in non-primary cortical areas.

show abstract

“…Hence, the binding of auditory to visual speech information appeared to be affected by the prior predictability of the visual stimulus. In other words, when visual cues fail to provide relevant information, they appear to be weighted less during the early processing stages, perhaps even prior to the influence of top-down attention (e.g., Massaro, 1998;van Wassenhove, Grant, & Poeppel, 2005;van Wassenhove, 2013). …”

mentioning

confidence: 99%

The McGurk effect: An investigation of attentional capacity employing response times

Altieri

Lentz

Townsend

et al. 2016

Atten Percept Psychophys

View full text Add to dashboard Cite

This paper proposes a novel approach to assess audiovisual integration for both congruent and incongruent speech stimuli using reaction times (RT). The experiments are based on the McGurk effect, in which a listener is presented with incongruent audiovisual speech signals. A typical example involves the auditory consonant/b/combined with a visually articulated/g/, often yielding a perception of/d/. We quantify the amount of integration relative to the predictions of a parallel independent model as a function of attention and congruency between auditory and visual signals. We assessed RT distributions for congruent and incongruent auditory and visual signals in a within-subjects signal detection paradigm under conditions of divided versus focused attention. Results showed that listeners often received only minimal benefit from congruent auditory visual stimuli, even when such information could have improved performance. Incongruent stimuli adversely affected performance in divided and focused attention conditions. Our findings support a parallel model of auditory-visual integration with interactions between auditory and visual channels.Keywords Speech perception . Attention: Selective . Reaction time methodsWhat cognitive mechanisms underlie speech recognition when audition is supplemented with visual information? The modern era of research into how auditory and visual speech cues interact began with Sumby and Pollack's (1954) seminal experimental work on audiovisual (AV) enhancement: They showed that visual cues provided by a talker's lip-movements facilitate auditory recognition across a range of signal-to-noise ratios. However, Massaro (1987a) proved that this outcome did not necessarily demonstrate integration since a singlechannel model could theoretically predict the results.In a critical study two decades later, McGurk and Macdonald (1976) reported a dramatic perceptual integration phenomenon that resulted from the presentation of incongruent auditory-visual speech signals. In what became known as the "McGurk effect," presentation of the auditory consonant/ b/over a visually articulated/g/yielded a fused percept of/d/. Audiovisual fusions such as these occur when the perceptual system maps cues from conflicting signals onto a phonemic category distinct from either input signal.1 Thus, the McGurk effect is a prime candidate with which to probe the mechanisms underlying integration.Several studies of the McGurk effect have been carried out, with the majority using mean accuracy as the dependent variable. In these studies, performance in auditory and visualonly trials is compared to accuracy in audiovisual trials, usually via confusion matrices (e.g., Massaro, 1987aMassaro, , 1998Massaro, , 2004. These experimental designs and modeling efforts have shed considerable light on speech integration: Auditory and visual cues appear to interact in a multiplicative manner 1 This cannot be demonstrated conclusively without also measuring responses to the single modality presentations (cf. Massaro 1987a(cf. M...

show abstract

Speech through ears and eyes: interfacing the senses with the supramodal brain

Cited by 68 publications

References 188 publications

Multisensory Integration: Flexible Use of General Operations

Multisensory Integration: Flexible Use of General Operations

Can you hear me yet? An intracranial investigation of speech and non-speech audiovisual interactions in human cortex

The McGurk effect: An investigation of attentional capacity employing response times

Contact Info

Product

Resources

About