Jean‐Luc Schwartz scite author profile

It is an old-standing debate in the field of speech communication to determine whether speech perception involves auditory or multisensory representations and processing, independently on any procedural knowledge about the production of speech units or on the contrary if it is based on a recoding of the sensory input in terms of articulatory gestures, as posited in the Motor Theory of Speech Perception. The discovery of mirror neurons in the last 15 years has strongly renewed the interest for motor theories. However, while these neurophysiological data clearly reinforce the plausibility of the role of motor properties in perception, it could lead in our view to incorrectly deemphasise the role of perceptual shaping, crucial in speech communication. The so-called Perception-for-Action-Control Theory (PACT) aims at defining a theoretical framework connecting in a principled way perceptual shaping and motor procedural knowledge in speech multisensory processing in the human brain. In this paper, the theory is presented in details. It is described how this theory fits with behavioural and linguistic data, concerning firstly vowel systems in human languages, and secondly the perceptual organization of the speech scene. Finally a neurocomputational framework is presented in connection with recent data on the possible functional role of the motor system in speech perception.Keywords: perceptuo-motor interaction; speech perception; vowel perception; perceptual organisation; multisensory interactions; neurocomputational model; dorsal route IntroductionIt is an old-standing debate in the field of speech communication to determine whether speech perception involves auditory or multisensory representations and processing, independently on any procedural knowledge about the production of speech units (for a review, Diehl et al., 2004); or on the contrary if it is based on a recoding of the sensory input in terms of articulatory gestures, as posited in the Motor Theory of Speech Perception (Liberman et al., 1962;Liberman and Mattingly 1985;Liberman & Whalen, 2000). The discovery of mirror neurons (for reviews, Rizzolatti et al., 2001;Rizzolatti & Craighero, 2004) in the last 15 years has strongly renewed the interest for motor theories. However, while these neurophysiological data clearly reinforce the plausibility of the role of motor properties in perception, it could lead in our view to incorrectly deemphasise the role of perceptual shaping, crucial in speech communication. The so-called Perception-for-Action-Control Theory (PACT) aims at defining a theoretical framework connecting in a principled way perceptual shaping and motor procedural knowledge in speech multisensory processing in the human brain. In the following, the theory will be presented in details in Section I.Sections II and III will describe how this theory fits with behavioural and linguistic data, concerning firstly vowel systems in human languages, and secondly the perceptual organization of the speech scene. Section IV will consider a neuro-c...

show abstract

Functional MRI assessment of orofacial articulators: Neural correlates of lip, jaw, larynx, and tongue movements

Grabski¹,

Lamalle²,

Vilain³

et al. 2011

Human Brain Mapping

143

126

View full text Add to dashboard Cite

Compared with complex coordinated orofacial actions, few neuroimaging studies have attempted to determine the shared and distinct neural substrates of supralaryngeal and laryngeal articulatory movements when performed independently. To determine cortical and subcortical regions associated with supralaryngeal motor control, participants produced lip, tongue and jaw movements while undergoing functional magnetic resonance imaging (fMRI). For laryngeal motor activity, participants produced the steady-state/i/vowel. A sparse temporal sampling acquisition method was used to minimize movement-related artifacts. Three main findings were observed. First, the four tasks activated a set of largely overlapping, common brain areas: the sensorimotor and premotor cortices, the right inferior frontal gyrus, the supplementary motor area, the left parietal operculum and the adjacent inferior parietal lobule, the basal ganglia and the cerebellum. Second, differences between tasks were restricted to the bilateral auditory cortices and to the left ventrolateral sensorimotor cortex, with greater signal intensity for vowel vocalization. Finally, a dorso-ventral somatotopic organization of lip, jaw, vocalic/laryngeal, and tongue movements was observed within the primary motor and somatosensory cortices using individual region-of-interest (ROI) analyses. These results provide evidence for a core neural network involved in laryngeal and supralaryngeal motor control and further refine the sensorimotor somatotopic organization of orofacial articulators.

show abstract

Seeing to hear better: evidence for early audio-visual interactions in speech identification

Schwartz

2004

Cognition

108

View full text Add to dashboard Cite

Lip reading is the ability to partially understand speech by looking at the speaker's lips. It improves the intelligibility of speech in noise when audiovisual perception is compared with audio-only perception. A recent set of experiments showed that seeing the speaker's lips also enhances sensitivity to acoustic information, decreasing the auditory detection threshold of speech embedded in noise (Grant & Seitz, 2000; Grant, 2001). However, detection is different from comprehension, and it remains to be seen whether improved sensitivity also results in an intelligibility gain in audiovisual speech perception. In this work, we use an original paradigm to show that seeing the speaker's lips enables the listener to hear better and hence to understand better. The audiovisual stimuli used here could not be differentiated by lip reading per se since they contained exactly the same lip gesture matched with different compatible speech sounds. Nevertheless, the noise-masked stimuli were more intelligible in the audiovisual condition than in the audio-only condition due to the contribution of visual information to the extraction of acoustic cues. Replacing the lip gesture by a non-speech visual input with exactly the same time course, providing the same temporal cues for extraction, removed the intelligibility benefit. This early contribution to audiovisual speech identification is discussed in relationships with recent neurophysiological data on audiovisual perception.

show abstract

Seeing to hear better: evidence for early audio-visual interactions in speech identification

2004

View full text Add to dashboard Cite

Lip reading is the ability to partially understand speech by looking at the speaker's lips. It improves the intelligibility of speech in noise when audio-visual perception is compared with audio-only perception. A recent set of experiments showed that seeing the speaker's lips also enhances sensitivity to acoustic information, decreasing the auditory detection threshold of speech embedded in noise [J. Acoust. Soc. Am. 109 (2001) 2272; J. Acoust. Soc. Am. 108 (2000) 1197]. However, detection is different from comprehension, and it remains to be seen whether improved sensitivity also results in an intelligibility gain in audio-visual speech perception. In this work, we use an original paradigm to show that seeing the speaker's lips enables the listener to hear better and hence to understand better. The audio-visual stimuli used here could not be differentiated by lip reading per se since they contained exactly the same lip gesture matched with different compatible speech sounds. Nevertheless, the noise-masked stimuli were more intelligible in the audio-visual condition than in the audio-only condition due to the contribution of visual information to the extraction of acoustic cues. Replacing the lip gesture by a non-speech visual input with exactly the same time course, providing the same temporal cues for extraction, removed the intelligibility benefit. This early contribution to audio-visual speech identification is discussed in relationships with recent neurophysiological data on audio-visual perception.

show abstract

Binding and unbinding the auditory and visual streams in the McGurk effect

Nahorna¹,

Berthommier²,

Schwartz³

2012

View full text Add to dashboard Cite

Subjects presented with coherent auditory and visual streams generally fuse them into a single percept. This results in enhanced intelligibility in noise, or in visual modification of the auditory percept in the McGurk effect. It is classically considered that processing is done independently in the auditory and visual systems before interaction occurs at a certain representational stage, resulting in an integrated percept. However, some behavioral and neurophysiological data suggest the existence of a two-stage process. A first stage would involve binding together the appropriate pieces of audio and video information before fusion per se in a second stage. Then it should be possible to design experiments leading to unbinding. It is shown here that if a given McGurk stimulus is preceded by an incoherent audiovisual context, the amount of McGurk effect is largely reduced. Various kinds of incoherent contexts (acoustic syllables dubbed on video sentences or phonetic or temporal modifications of the acoustic content of a regular sequence of audiovisual syllables) can significantly reduce the McGurk effect even when they are short (less than 4 s). The data are interpreted in the framework of a two-stage "binding and fusion" model for audiovisual speech perception.

show abstract

Touch increases autonomic coupling between romantic partners

Chatel-Goldman

Congedo

Jutten

et al. 2014

Front. Behav. Neurosci.

102

View full text Add to dashboard Cite

Interpersonal touch is of paramount importance in human social bonding and close relationships, allowing a unique channel for affect communication. So far the effect of touch on human physiology has been studied at an individual level. The present study aims at extending the study of affective touch from isolated individuals to truly interacting dyads. We have designed an ecological paradigm where romantic partners interact only via touch and we manipulate their empathic states. Simultaneously, we collected their autonomic activity (skin conductance, pulse, respiration). Fourteen couples participated to the experiment. We found that interpersonal touch increased coupling of electrodermal activity between the interacting partners, regardless the intensity and valence of the emotion felt. In addition, physical touch induced strong and reliable changes in physiological states within individuals. These results support an instrumental role of interpersonal touch for affective support in close relationships. Furthermore, they suggest that touch alone allows the emergence of a somatovisceral resonance between interacting individuals, which in turn is likely to form the prerequisites for emotional contagion and empathy.

show abstract

Multistability in perception: binding sensory modalities, an overview

Schwartz

Grimault

Hupé

et al. 2012

Phil. Trans. R. Soc. B

119

View full text Add to dashboard Cite

This special issue presents research concerning multistable perception in different sensory modalities. Multistability occurs when a single physical stimulus produces alternations between different subjective percepts. Multistability was first described for vision, where it occurs, for example, when different stimuli are presented to the two eyes or for certain ambiguous figures. It has since been described for other sensory modalities, including audition, touch and olfaction. The key features of multistability are: (i) stimuli have more than one plausible perceptual organization; (ii) these organizations are not compatible with each other. We argue here that most if not all cases of multistability are based on competition in selecting and binding stimulus information. Binding refers to the process whereby the different attributes of objects in the environment, as represented in the sensory array, are bound together within our perceptual systems, to provide a coherent interpretation of the world around us. We argue that multistability can be used as a method for studying binding processes within and across sensory modalities. We emphasize this theme while presenting an outline of the papers in this issue. We end with some thoughts about open directions and avenues for further research.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.