Decoding verbal and nonverbal emotional expressions is an important part of speech communication. Although various studies have tried to specify the brain regions that underlie different emotions conveyed in speech, few studies have aimed to specify the time course of emotional speech decoding. We used event-related potentials to determine when emotional speech is first differentiated from neutral speech. Participants engaged in an implicit emotional processing task (probe verification) while listening to emotional sentences spoken by a female and a male speaker. Independent of speaker voice, emotional sentences could be differentiated from neutral sentences as early as 200 ms after sentence onset (P200), suggesting rapid emotional decoding.
Expressions of basic emotions (joy, sadness, anger, fear, disgust) can be recognized pan-culturally from the face and it is assumed that these emotions can be recognized from a speaker's voice, regardless of an individual's culture or linguistic ability. Here, we compared how monolingual speakers of Argentine Spanish recognize basic emotions from pseudo-utterances (''nonsense speech'') produced in their native language and in three foreign languages (English, German, Arabic). Results indicated that vocal expressions of basic emotions could be decoded in each language condition at accuracy levels exceeding chance, although Spanish listeners performed significantly better overall in their native language (''in-group advantage''). Our findings argue that the ability to understand vocally-expressed emotions in speech is partly independent of linguistic ability and involves universal principles, although this ability is also shaped by linguistic and cultural variables.
In social interactions, humans can express how they feel in what (verbal) they say and how (non‐verbal) they say it. Although decoding of vocal emotion expressions occurs rapidly, accumulating electrophysiological evidence suggests that this process is multilayered and involves temporally and functionally distinct processing steps. Neuroimaging and lesion data confirm that these processing steps, which support emotional speech and language comprehension, are anchored in a functionally differentiated brain network. The present review on emotional speech and language processing discusses concepts and empirical clinical and neuroscientific evidence on the basis of behavioral, event‐related brain potential, and functional magnetic resonance imaging data. These data allow shaping our understanding of how we communicate emotions to others through speech and language. It leads to a multistep processing model of vocal and visual emotion expressions.
Previous research suggests that emotional prosody processing is a highly rapid and complex process. In particular, it has been shown that different basic emotions can be differentiated in an early event-related brain potential (ERP) component, the P200. Often, the P200 is followed by later long lasting ERPs such as the late positive complex. The current experiment set out to explore in how far emotionality and arousal can modulate these previously reported ERP components. In addition, we also investigated the influence of task demands (implicit vs. explicit evaluation of stimuli). Participants listened to pseudo-sentences (sentences with no lexical content) spoken in six different emotions or in a neutral tone of voice while they either rated the arousal level of the speaker or their own arousal level. Results confirm that different emotional intonations can first be differentiated in the P200 component, reflecting a first emotional encoding of the stimulus possibly including a valence tagging process. A marginal significant arousal effect was also found in this time-window with high arousing stimuli eliciting a stronger P200 than low arousing stimuli. The P200 component was followed by a long lasting positive ERP between 400 and 750 ms. In this late time-window, both emotion and arousal effects were found. No effects of task were observed in either time-window. Taken together, results suggest that emotion relevant details are robustly decoded during early processing and late processing stages while arousal information is only reliably taken into consideration at a later stage of processing.
This study used event-related brain potentials (ERPs) to compare the time course of emotion processing from non-linguistic vocalizations versus speech prosody, to test whether vocalizations are treated preferentially by the neurocognitive system. Participants passively listened to vocalizations or pseudo-utterances conveying anger, sadness, or happiness as the EEG was recorded. Simultaneous effects of vocal expression type and emotion were analyzed for three ERP components (N100, P200, Late Positive Component). Emotional vocalizations and speech were differentiated very early (N100) and vocalizations elicited stronger, earlier, and more differentiated P200 responses than speech. At later stages (450-700ms), anger vocalizations evoked a stronger late positivity (LPC) than other vocal expressions, which was similar but delayed for angry speech.Individuals with high trait anxiety exhibited early, heightened sensitivity to vocal emotions (particularly vocalizations). These data provide new neurophysiological evidence that vocalizations, as evolutionarily primitive signals, are accorded precedence over speech-embedded emotions in the human voice.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.