The capability of differentiating between various emotional states in speech displays a crucial prerequisite for successful social interactions. The aim of the present study was to investigate neural processes underlying this differentiating ability by applying a simultaneous neuroscientific approach in order to gain both electrophysiological (via electroencephalography, EEG) and vascular (via functional near-infrared-spectroscopy, fNIRS) responses. Pseudowords conforming to angry, happy, and neutral prosody were presented acoustically to participants using a passive listening paradigm in order to capture implicit mechanisms of emotional prosody processing. Event-related brain potentials (ERPs) revealed a larger P200 and an increased late positive potential (LPP) for happy prosody as well as larger negativities for angry and neutral prosody compared to happy prosody around 500 ms. FNIRS results showed increased activations for angry prosody at right fronto-temporal areas. Correlation between negativity in the EEG and activation in fNIRS for angry prosody suggests analogous underlying processes resembling a negativity bias. Overall, results indicate that mechanisms of emotional and phonological encoding (P200), emotional evaluation (increased negativities) as well as emotional arousal and relevance (LPP) are present during implicit processing of emotional prosody. For a successful interpersonal communication, correct identification and processing of emotional states of one's counterpart are necessary. Emotions can be transported via multiple modalities like facial expressions and hand gestures but also via prosody of speech. Prosody, that is, the melodic contour of a word or sentence, originates from the interaction of pitch, loudness, rhythm, intensity, and frequency of specific verbalizations 1. The terms emotional or affective prosody refer to those intonational patterns carrying emotional states such as a happy or angry emotion 2,3. While research on emotional processing in the visual domain (e.g., facial expressions, pictures with emotional content) already been intensively conducted, emotional prosody processing is still less investigated. Identifying emotions through the voice is highly relevant in everyday life, even more so when visual information is not available, as acoustic parameters have the ability to travel longer distances, while visual cues are in need of close proximity to the target 4. It is known that with a combination of several modalities delivering emotional information, emotion identification improves, however, when contrasting unimodal information, emotions seem to be easier recognizable from speech than faces 5,6. Furthermore, the emotional auditory input is able to direct attention to relevant emotional stimuli in the environment and delivers additional crucial information for how to react to facial expressions (e.g., recognition facilitation of visual cues) 7-9. Neuroscientific studies bear the potential to provide deeper understanding of mechanisms underlying emotional identification...