Cultural differences in emotion perception have been reported mainly for facial expressions and to a lesser extent for vocal expressions. However, the way in which the perceiver combines auditory and visual cues may itself be subject to cultural variability. Our study investigated cultural differences between Japanese and Dutch participants in the multisensory perception of emotion. A face and a voice, expressing either congruent or incongruent emotions, were presented on each trial. Participants were instructed to judge the emotion expressed in one of the two sources. The effect of to-be-ignored voice information on facial judgments was larger in Japanese than in Dutch participants, whereas the effect of to-be-ignored face information on vocal judgments was smaller in Japanese than in Dutch participants. This result indicates that Japanese people are more attuned than Dutch people to vocal processing in the multisensory perception of emotion. Our findings provide the first evidence that multisensory integration of affective information is modulated by perceivers' cultural background.
Previous studies have shown that the perception of facial and vocal affective expressions interacts with each other. Facial expressions usually dominate vocal expressions when we perceive the emotions of face–voice stimuli. In most of these studies, participants were instructed to pay attention to the face or voice. Few studies compared the perceived emotions with and without specific instructions regarding the modality to which attention should be directed. Also, these studies used combinations of the face and voice which expresses two opposing emotions, which limits the generalizability of the findings. The purpose of this study is to examine whether the emotion perception is modulated by instructions to pay attention to the face or voice using the six basic emotions. Also we examine the modality dominance between the face and voice for each emotion category. Before the experiment, we recorded faces and voices which expresses the six basic emotions and orthogonally combined these faces and voices. Consequently, the emotional valence of visual and auditory information was either congruent or incongruent. In the experiment, there were unisensory and multisensory sessions. The multisensory session was divided into three blocks according to whether an instruction was given to pay attention to a given modality (face attention, voice attention, and no instruction). Participants judged whether the speaker expressed happiness, sadness, anger, fear, disgust, or surprise. Our results revealed that instructions to pay attention to one modality and congruency of the emotions between modalities modulated the modality dominance, and the modality dominance is differed for each emotion category. In particular, the modality dominance for anger changed according to each instruction. Analyses also revealed that the modality dominance suggested by the congruency effect can be explained in terms of the facilitation effect and the interference effect.
Anxious individuals have been shown to interpret others' emotional states negatively. Since most studies have used facial expressions as emotional cues, we examined whether trait anxiety affects the recognition of emotion in a dynamic face and voice that were presented in synchrony. The face and voice cues conveyed either matched (e.g., happy face and voice) or mismatched emotions (e.g., happy face and angry voice). Participants with high or low trait anxiety were to indicate the perceived emotion using one of the cues while ignoring the other. The results showed that individuals with high trait anxiety were more likely to interpret others' emotions in a negative manner, putting more weight on the to-be-ignored angry cues. This interpretation bias was found regardless of the cue modality (i.e., face or voice). Since trait anxiety did not affect recognition of the face or voice cues presented in isolation, this interpretation bias appears to reflect an altered integration of the face and voice cues among anxious individuals.
This study aims to further examine the cross-cultural differences in multisensory emotion perception between Western and East Asian people. In this study, we recorded the audiovisual stimulus video of Japanese and Dutch actors saying neutral phrase with one of the basic emotions. Then we conducted a validation experiment of the stimuli. In the first part (facial expression), participants watched a silent video of actors and judged what kind of emotion the actor is expressing by choosing among 6 options (ie, happiness, anger, disgust, sadness, surprise, and fear). In the second part (vocal expression), they listened to the audio part of the same videos without video images while the task was the same. We analyzed their categorization responses based on accuracy and confusion matrix and created a controlled audiovisual stimulus set.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.