Social interaction involves the active visual perception of facial expressions and communicative gestures. This study examines the distribution of gaze fixations while watching videos of expressive talking faces. The knowledge-driven factors that influence the selective visual processing of facial information were examined by using the same set of stimuli, and assigning subjects to either a speech recognition task or an emotion judgment task. For half of the subjects assigned to each of the tasks, the intelligibility of the speech was manipulated by the addition of moderate masking noise. Both tasks and the intelligibility of the speech signal influenced the spatial distribution of gaze. Gaze was concentrated more on the eyes when emotion was being judged as compared to when words were being identified. When noise was added to the acoustic signal, gaze in both tasks was more centralized on the face. This shows that subject's gaze is sensitive to the distribution of information on the face, but can also be influenced by strategies aimed at maximizing the amount of visual information processed.
During face-to-face conversation the face provides auditory and visual linguistic information, and also conveys information about the identity of the speaker. This study investigated behavioral strategies involved in gathering visual information while watching talking faces. The effects of varying talker identity and varying the intelligibility of speech (by adding acoustic noise) on gaze behavior were measured with an eyetracker. Varying the intelligibility of the speech by adding noise had a noticeable effect on the location and duration of fixations. When noise was present subjects adopted a vantage point that was more centralized on the face by reducing the frequency of the fixations on the eyes and mouth and lengthening the duration of their gaze fixations on the nose and mouth. Varying talker identity resulted in a more modest change in gaze behavior that was modulated by the intelligibility of the speech. Although subjects generally used similar strategies to extract visual information in both talker variability conditions, when noise was absent there were more fixations on the mouth when viewing a different talker every trial as opposed to the same talker every trial. These findings provide a useful baseline for studies examining gaze behavior during audiovisual speech perception and perception of dynamic faces. IntroductionWe see and process faces every day in a wide variety of contexts, from line drawings of faces and static photographs, to dynamic movies and live faces during face-to-face communication. These faces contain a wealth of social, emotional, identity and linguistic information. Although a great deal of information can be gleaned from static faces, the motion of dynamic faces contains information about identity and emotion not present in static faces (Ambadar, Schooler & Cohn, 2005;Hill & Johnson, 2001;Knappmeyer, Thornton & Bülthoff, 2003;O'Toole, Roark, & Abdi, 2002;Lander & Bruce, 2000). Facial motion also contains linguistic information, as evidenced by the fact that silent speechreading is possible (Bernstein, Demorest & Tucker, 2000). Rarely though, is this visual speech information present in the complete absence of auditory speech information and audiovisual speech perception is the natural manner of communication. Visual speech information influences the perception of auditory speech in both perfectly audible conditions (McGurk & MacDonald, 1976, MacDonald & Correspondence: J.N. Buchan, Department of Psychology, Queen's University, Humphrey Hall, 62 Arch Street, Kingston, Ontario, Canada K7L 3N6, Tel: 613-533-6275, E-mail: 2jnb@queensu.ca. Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and al...
Conflicting visual speech information can influence the perception of acoustic speech, causing an illusory percept of a sound not present in the actual acoustic speech (the McGurk effect). We examined whether participants can voluntarily selectively attend to either the auditory or visual modality by instructing participants to pay attention to the information in one modality and to ignore competing information from the other modality. We also examined how performance under these instructions was affected by weakening the influence of the visual information by manipulating the temporal offset between the audio and video channels (experiment 1), and the spatial frequency information present in the video (experiment 2). Gaze behaviour was also monitored to examine whether attentional instructions influenced the gathering of visual information. While task instructions did have an influence on the observed integration of auditory and visual speech information, participants were unable to completely ignore conflicting information, particularly information from the visual stream. Manipulating temporal offset had a more pronounced interaction with task instructions than manipulating the amount of visual information. Participants' gaze behaviour suggests that the attended modality influences the gathering of visual information in audiovisual speech perception.
Audiovisual speech perception is an everyday occurrence of multisensory integration. Conflicting visual speech information can influence the perception of acoustic speech (namely the McGurk effect), and auditory and visual speech are integrated over a rather wide range of temporal offsets. This research examined whether the addition of a concurrent cognitive load task would affect the audiovisual integration in a McGurk speech task and whether the cognitive load task would cause more interference at increasing offsets. The amount of integration was measured by the proportion of responses in incongruent trials that did not correspond to the audio (McGurk response). An eye-tracker was also used to examine whether the amount of temporal offset and the presence of a concurrent cognitive load task would influence gaze behavior. Results from this experiment show a very modest but statistically significant decrease in the number of McGurk responses when subjects also perform a cognitive load task, and that this effect is relatively constant across the various temporal offsets. Participant's gaze behavior was also influenced by the addition of a cognitive load task. Gaze was less centralized on the face, less time was spent looking at the mouth and more time was spent looking at the eyes, when a concurrent cognitive load task was added to the speech task.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.