For cochlear implant (CI) users, degraded spectral input hampers the
understanding of prosodic vocal emotion, especially in difficult listening
conditions. Using a vocoder simulation of CI hearing, we examined the extent to
which informative multimodal cues in a talker’s spoken expressions improve
normal hearing (NH) adults’ speech and emotion perception under different levels
of spectral degradation (two, three, four, and eight spectral bands).
Participants repeated the words verbatim and identified emotions (among four
alternative options: happy, sad, angry, and neutral) in meaningful sentences
that are semantically congruent with the expression of the intended emotion.
Sentences were presented in their natural speech form and in speech sampled
through a noise-band vocoder in sound (auditory-only) and video
(auditory–visual) recordings of a female talker. Visual information had a more
pronounced benefit in enhancing speech recognition in the lower spectral band
conditions. Spectral degradation, however, did not interfere with emotion
recognition performance when dynamic visual cues in a talker’s expression are
provided as participants scored at ceiling levels across all spectral band
conditions. Our use of familiar sentences that contained congruent semantic and
prosodic information have high ecological validity, which likely optimized
listener performance under simulated CI hearing and may better predict CI users’
outcomes in everyday listening contexts.