It is an old-standing debate in the field of speech communication to determine whether speech perception involves auditory or multisensory representations and processing, independently on any procedural knowledge about the production of speech units or on the contrary if it is based on a recoding of the sensory input in terms of articulatory gestures, as posited in the Motor Theory of Speech Perception. The discovery of mirror neurons in the last 15 years has strongly renewed the interest for motor theories. However, while these neurophysiological data clearly reinforce the plausibility of the role of motor properties in perception, it could lead in our view to incorrectly deemphasise the role of perceptual shaping, crucial in speech communication. The so-called Perception-for-Action-Control Theory (PACT) aims at defining a theoretical framework connecting in a principled way perceptual shaping and motor procedural knowledge in speech multisensory processing in the human brain. In this paper, the theory is presented in details. It is described how this theory fits with behavioural and linguistic data, concerning firstly vowel systems in human languages, and secondly the perceptual organization of the speech scene. Finally a neurocomputational framework is presented in connection with recent data on the possible functional role of the motor system in speech perception.Keywords: perceptuo-motor interaction; speech perception; vowel perception; perceptual organisation; multisensory interactions; neurocomputational model; dorsal route IntroductionIt is an old-standing debate in the field of speech communication to determine whether speech perception involves auditory or multisensory representations and processing, independently on any procedural knowledge about the production of speech units (for a review, Diehl et al., 2004); or on the contrary if it is based on a recoding of the sensory input in terms of articulatory gestures, as posited in the Motor Theory of Speech Perception (Liberman et al., 1962;Liberman and Mattingly 1985;Liberman & Whalen, 2000). The discovery of mirror neurons (for reviews, Rizzolatti et al., 2001;Rizzolatti & Craighero, 2004) in the last 15 years has strongly renewed the interest for motor theories. However, while these neurophysiological data clearly reinforce the plausibility of the role of motor properties in perception, it could lead in our view to incorrectly deemphasise the role of perceptual shaping, crucial in speech communication. The so-called Perception-for-Action-Control Theory (PACT) aims at defining a theoretical framework connecting in a principled way perceptual shaping and motor procedural knowledge in speech multisensory processing in the human brain. In the following, the theory will be presented in details in Section I.Sections II and III will describe how this theory fits with behavioural and linguistic data, concerning firstly vowel systems in human languages, and secondly the perceptual organization of the speech scene. Section IV will consider a neuro-c...
Compared with complex coordinated orofacial actions, few neuroimaging studies have attempted to determine the shared and distinct neural substrates of supralaryngeal and laryngeal articulatory movements when performed independently. To determine cortical and subcortical regions associated with supralaryngeal motor control, participants produced lip, tongue and jaw movements while undergoing functional magnetic resonance imaging (fMRI). For laryngeal motor activity, participants produced the steady-state/i/vowel. A sparse temporal sampling acquisition method was used to minimize movement-related artifacts. Three main findings were observed. First, the four tasks activated a set of largely overlapping, common brain areas: the sensorimotor and premotor cortices, the right inferior frontal gyrus, the supplementary motor area, the left parietal operculum and the adjacent inferior parietal lobule, the basal ganglia and the cerebellum. Second, differences between tasks were restricted to the bilateral auditory cortices and to the left ventrolateral sensorimotor cortex, with greater signal intensity for vowel vocalization. Finally, a dorso-ventral somatotopic organization of lip, jaw, vocalic/laryngeal, and tongue movements was observed within the primary motor and somatosensory cortices using individual region-of-interest (ROI) analyses. These results provide evidence for a core neural network involved in laryngeal and supralaryngeal motor control and further refine the sensorimotor somatotopic organization of orofacial articulators.
Lip reading is the ability to partially understand speech by looking at the speaker's lips. It improves the intelligibility of speech in noise when audiovisual perception is compared with audio-only perception. A recent set of experiments showed that seeing the speaker's lips also enhances sensitivity to acoustic information, decreasing the auditory detection threshold of speech embedded in noise (Grant & Seitz, 2000; Grant, 2001). However, detection is different from comprehension, and it remains to be seen whether improved sensitivity also results in an intelligibility gain in audiovisual speech perception. In this work, we use an original paradigm to show that seeing the speaker's lips enables the listener to hear better and hence to understand better. The audiovisual stimuli used here could not be differentiated by lip reading per se since they contained exactly the same lip gesture matched with different compatible speech sounds. Nevertheless, the noise-masked stimuli were more intelligible in the audiovisual condition than in the audio-only condition due to the contribution of visual information to the extraction of acoustic cues. Replacing the lip gesture by a non-speech visual input with exactly the same time course, providing the same temporal cues for extraction, removed the intelligibility benefit. This early contribution to audiovisual speech identification is discussed in relationships with recent neurophysiological data on audiovisual perception.
Lip reading is the ability to partially understand speech by looking at the speaker's lips. It improves the intelligibility of speech in noise when audio-visual perception is compared with audio-only perception. A recent set of experiments showed that seeing the speaker's lips also enhances sensitivity to acoustic information, decreasing the auditory detection threshold of speech embedded in noise [J. Acoust. Soc. Am. 109 (2001) 2272; J. Acoust. Soc. Am. 108 (2000) 1197]. However, detection is different from comprehension, and it remains to be seen whether improved sensitivity also results in an intelligibility gain in audio-visual speech perception. In this work, we use an original paradigm to show that seeing the speaker's lips enables the listener to hear better and hence to understand better. The audio-visual stimuli used here could not be differentiated by lip reading per se since they contained exactly the same lip gesture matched with different compatible speech sounds. Nevertheless, the noise-masked stimuli were more intelligible in the audio-visual condition than in the audio-only condition due to the contribution of visual information to the extraction of acoustic cues. Replacing the lip gesture by a non-speech visual input with exactly the same time course, providing the same temporal cues for extraction, removed the intelligibility benefit. This early contribution to audio-visual speech identification is discussed in relationships with recent neurophysiological data on audio-visual perception.
Subjects presented with coherent auditory and visual streams generally fuse them into a single percept. This results in enhanced intelligibility in noise, or in visual modification of the auditory percept in the McGurk effect. It is classically considered that processing is done independently in the auditory and visual systems before interaction occurs at a certain representational stage, resulting in an integrated percept. However, some behavioral and neurophysiological data suggest the existence of a two-stage process. A first stage would involve binding together the appropriate pieces of audio and video information before fusion per se in a second stage. Then it should be possible to design experiments leading to unbinding. It is shown here that if a given McGurk stimulus is preceded by an incoherent audiovisual context, the amount of McGurk effect is largely reduced. Various kinds of incoherent contexts (acoustic syllables dubbed on video sentences or phonetic or temporal modifications of the acoustic content of a regular sequence of audiovisual syllables) can significantly reduce the McGurk effect even when they are short (less than 4 s). The data are interpreted in the framework of a two-stage "binding and fusion" model for audiovisual speech perception.
Interpersonal touch is of paramount importance in human social bonding and close relationships, allowing a unique channel for affect communication. So far the effect of touch on human physiology has been studied at an individual level. The present study aims at extending the study of affective touch from isolated individuals to truly interacting dyads. We have designed an ecological paradigm where romantic partners interact only via touch and we manipulate their empathic states. Simultaneously, we collected their autonomic activity (skin conductance, pulse, respiration). Fourteen couples participated to the experiment. We found that interpersonal touch increased coupling of electrodermal activity between the interacting partners, regardless the intensity and valence of the emotion felt. In addition, physical touch induced strong and reliable changes in physiological states within individuals. These results support an instrumental role of interpersonal touch for affective support in close relationships. Furthermore, they suggest that touch alone allows the emergence of a somatovisceral resonance between interacting individuals, which in turn is likely to form the prerequisites for emotional contagion and empathy.
This special issue presents research concerning multistable perception in different sensory modalities. Multistability occurs when a single physical stimulus produces alternations between different subjective percepts. Multistability was first described for vision, where it occurs, for example, when different stimuli are presented to the two eyes or for certain ambiguous figures. It has since been described for other sensory modalities, including audition, touch and olfaction. The key features of multistability are: (i) stimuli have more than one plausible perceptual organization; (ii) these organizations are not compatible with each other. We argue here that most if not all cases of multistability are based on competition in selecting and binding stimulus information. Binding refers to the process whereby the different attributes of objects in the environment, as represented in the sensory array, are bound together within our perceptual systems, to provide a coherent interpretation of the world around us. We argue that multistability can be used as a method for studying binding processes within and across sensory modalities. We emphasize this theme while presenting an outline of the papers in this issue. We end with some thoughts about open directions and avenues for further research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.