The human face is an important and complex communication channel. Humans can, however, easily read in a face not only identity information but also facial expressions with high accuracy. Here, we present the results of four psychophysical experiments in which we systematically manipulated certain facial areas in video sequences of nine conversational expressions to investigate recognition performance and its dependency on the motions of different facial parts. The results help to demonstrate what information is perceptually necessary and sufficient to recognize the different facial expressions. Subsequent analyses of the facial movements and correlation with recognition performance show that, for some expressions, one individual facial region can represent the whole expression. In other cases, the interaction of more than one facial area is needed to clarify the expression. The full set of results is used to develop a systematic description of the roles of different facial parts in the visual perception of conversational facial expressions.
Communication is critical for normal, everyday life. During a conversation, information is conveyed in a number of ways, including through body, head, and facial changes. While much research has examined these latter forms of communication, the majority of it has focused on static representations of a few, supposedly universal expressions. Normal conversations, however, contain a very wide variety of expressions and are rarely, if ever, static. Here, we report several experiments that show that expressions that use head, eye, and internal facial motion are recognized more easily and accurately than static versions of those expressions. Moreover, we demonstrate conclusively that this dynamic advantage is due to information that is only available over time, and that the temporal integration window for this information is at least 100 ms long.
The ability to communicate is one of the core aspects of human life. For this, we use not only verbal but also nonverbal signals of remarkable complexity. Among the latter, facial expressions belong to the most important information channels. Despite the large variety of facial expressions we use in daily life, research on facial expressions has so far mostly focused on the emotional aspect. Consequently, most databases of facial expressions available to the research community also include only emotional expressions, neglecting the largely unexplored aspect of conversational expressions. To fill this gap, we present the MPI facial expression database, which contains a large variety of natural emotional and conversational expressions. The database contains 55 different facial expressions performed by 19 German participants. Expressions were elicited with the help of a method-acting protocol, which guarantees both well-defined and natural facial expressions. The method-acting protocol was based on every-day scenarios, which are used to define the necessary context information for each expression. All facial expressions are available in three repetitions, in two intensities, as well as from three different camera angles. A detailed frame annotation is provided, from which a dynamic and a static version of the database have been created. In addition to describing the database in detail, we also present the results of an experiment with two conditions that serve to validate the context scenarios as well as the naturalness and recognizability of the video sequences. Our results provide clear evidence that conversational expressions can be recognized surprisingly well from visual information alone. The MPI facial expression database will enable researchers from different research fields (including the perceptual and cognitive sciences, but also affective computing, as well as computer vision) to investigate the processing of a wider range of natural facial expressions.
Robust and effortless spatial orientation critically relies on "automatic and obligatory spatial updating", a largely automatized and reflex-like process that transforms our mental egocentric representation of the immediate surroundings during ego-motions. A rapid pointing paradigm was used to assess automatic/obligatory spatial updating after visually displayed upright rotations with or without concomitant physical rotations using a motion platform. Visual stimuli displaying a natural, subject-known scene proved sufficient for enabling automatic and obligatory spatial updating, irrespective of concurrent physical motions. This challenges the prevailing notion that visual cues alone are insufficient for enabling such spatial updating of rotations, and that vestibular/proprioceptive cues are both required and sufficient. Displaying optic flow devoid of landmarks during the motion and pointing phase was insufficient for enabling automatic spatial updating, but could not be entirely ignored either. Interestingly, additional physical motion cues hardly improved performance, and were insufficient for affording automatic spatial updating. The results are discussed in the context of the mental transformation hypothesis and the sensorimotor interference hypothesis, which associates difficulties in imagined perspective switches to interference between the sensorimotor and cognitive (to-be-imagined) perspective.
Most events are processed by a number of neural pathways. These pathways often differ considerably in processing speed. Thus, coherent perception requires some form of synchronization mechanism. Moreover, this mechanism must be flexible, because neural processing speed changes over the life of an organism. Here we provide behavioral evidence that humans can adapt to a new intersensory temporal relationship (which was artificially produced by delaying visual feedback). The conflict between these results and previous work that failed to find such improvements can be explained by considering the present results as a form of sensorimotor adaptation.
The human face is capable of producing an astonishing variety of expressions-expressions for which sometimes the smallest difference changes the perceived meaning considerably. Producing realistic-looking facial animations that are able to transmit this degree of complexity continues to be a challenging research topic in computer graphics. One important question that remains to be answered is: When are facial animations good enough? Here we present an integrated framework in which psychophysical experiments are used in a first step to systematically evaluate the perceptual quality of several different computer-generated animations with respect to real-world video sequences. The first experiment provides an evaluation of several animation techniques, exposing specific animation parameters that are important to achieve perceptual fidelity. In a second experiment, we then use these benchmarked animation techniques in the context of perceptual research in order to systematically investigate the spatiotemporal characteristics of expressions. A third and final experiment uses the quality measures that were developed in the first two experiments to examine the perceptual impact of changing facial features to improve the animation techniques. Using such an integrated approach, we are able to provide important insights into facial expressions for both the perceptual and computer graphics community.
Communication plays a central role in everday life. During an average conversation, information is exchanged in a variety of ways, including through facial motion. Here, we employ a custom, model-based image manipulation technique to selectively "freeze" portions of a face in video recordings in order to determine the areas that are sufficient for proper recognition of nine conversational expressions. The results show that most expressions rely primarily on a single facial area to convey meaning with different expressions using different areas. The results also show that the combination of rigid head, eye, eyebrow, and mouth motions is sufficient to produce expressions that are as easy to recognize as the original, unmanipulated recordings. Finally, the results show that the manipulation technique introduced few perceptible artifacts into the altered video sequences. This fusion of psychophysics and computer graphics techniques provides not only fundamental insights into human perception and cognition, but also yields the basis for a systematic description of what needs to move in order to produce realistic, recognizable conversational facial animations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.