The present study investigated the ability of normal-hearing listeners and cochlear implant users to recognize vocal emotions. Sentences were produced by 1 male and 1 female talker according to 5 target emotions: angry, anxious, happy, sad, and neutral. Overall amplitude differences between the stimuli were either preserved or normalized. In experiment 1, vocal emotion recognition was measured in normal-hearing and cochlear implant listeners; cochlear implant subjects were tested using their clinically assigned processors. When overall amplitude cues were preserved, normal-hearing listeners achieved near-perfect performance, whereas listeners with cochlear implant recognized less than half of the target emotions. Removing the overall amplitude cues significantly worsened mean normal-hearing and cochlear implant performance. In experiment 2, vocal emotion recognition was measured in listeners with cochlear implant as a function of the number of channels (from 1 to 8) and envelope filter cutoff frequency (50 vs 400 Hz) in experimental speech processors. In experiment 3, vocal emotion recognition was measured in normal-hearing listeners as a function of the number of channels (from 1 to 16) and envelope filter cutoff frequency (50 vs 500 Hz) in acoustic cochlear implant simulations. Results from experiments 2 and 3 showed that both cochlear implant and normal-hearing performance significantly improved as the number of channels or the envelope filter cutoff frequency was increased. The results suggest that spectral, temporal, and overall amplitude cues each contribute to vocal emotion recognition. The poorer cochlear implant performance is most likely attributable to the lack of salient pitch cues and the limited functional spectral resolution.
OBJECTIVE Fundamental frequency (F0) information is important to Chinese tone and speech recognition. Cochlear implant (CI) speech processors typically provide limited F0 information via temporal envelopes delivered to stimulating electrodes. Previous studies have shown that English-speaking CI users’ speech performance is correlated with amplitude modulation detection thresholds (AMDTs). The present study investigated whether Chinese-speaking CI users’ speech performance (especially tone recognition) is correlated with temporal processing capabilities. DESIGN Chinese tone, vowel, consonant, and sentence recognition were measured in 10 native Mandarin-speaking CI users via clinically assigned speech processors. AMDTs were measured in the same subjects for 20- and 100-Hz AM presented to a middle electrode at 5 stimulation levels that spanned the dynamic range (DR). To further investigate the CI users’ sensitivity to temporal envelope cues, AM frequency discrimination thresholds (AMFDTs) were measured for 2 standard AM frequencies (50 and 100 Hz), presented to the same middle electrode at 30% and 70% DR with a fixed modulation depth (50%). RESULTS Results showed that AMDTs significantly improved with increasing stimulation level, and that individual subjects exhibited markedly different AMDT functions. AMFDTs also improved with increasing stimulation level, and were better with the 100-Hz standard AM frequency than with the 50-Hz standard AM frequency. Statistical analyses revealed that both mean AMDTs (averaged for 20- or 100-Hz AM across all stimulation levels) and mean AMFDTs (averaged for the 50-Hz standard AM frequency across both stimulation levels) were significantly correlated with tone, consonant, and sentence recognition scores, but not with vowel recognition scores. Mean AMDTs were also significantly correlated with mean AMFDTs. CONCLUSIONS These preliminary results, obtained from a limited number of subjects, demonstrate the importance of temporal processing to CI speech recognition. The results further suggest that CI users’ Chinese tone and speech recognition may be improved by enhancing temporal envelope cues delivered by speech processing algorithms.
Tone recognition is important for speech understanding in tonal languages such as Mandarin Chinese. Cochlear implant patients are able to perceive some tonal information by using temporal cues such as periodicity-related amplitude fluctuations and similarities between the fundamental frequency (F0) contour and the amplitude envelope. The present study investigates whether modifying the amplitude envelope to better resemble the F0 contour can further improve tone recognition in multichannel cochlear implants. Chinese tone and vowel recognition were measured for six native Chinese normal-hearing subjects listening to a simulation of a four-channel cochlear implant speech processor with and without amplitude envelope enhancement. Two algorithms were proposed to modify the amplitude envelope to more closely resemble the F0 contour. In the first algorithm, the amplitude envelope as well as the modulation depth of periodicity fluctuations was adjusted for each spectral channel. In the second algorithm, the overall amplitude envelope was adjusted before multichannel speech processing, thus reducing any local distortions to the speech spectral envelope. The results showed that both algorithms significantly improved Chinese tone recognition. By adjusting the overall amplitude envelope to match the F0 contour before multichannel processing, vowel recognition was better preserved and less speech-processing computation was required. The results suggest that modifying the amplitude envelope to more closely resemble the F0 contour may be a useful approach toward improving Chinese-speaking cochlear implant patients' tone recognition.
Our recent study found that cochlear implant (CI) users’ quality of life in auditory, psychological, and social functioning were predicted by vocal emotion rather than sentence recognition scores. To eventually improve vocal emotion recognition with CIs, this study investigated the acoustic cues for vocal emotion recognition by CI users with CI alone or bimodal fitting, as compared to normal-hearing (NH) listeners. Sentence duration, mean fundamental frequency (F0), and F0 range were individually normalized for emotional utterances of each talker and sentence. In two other conditions, emotional utterances were presented backward in time and with upside-down F0 contours, respectively. Perceptual results showed significant effects of subject group, cue condition, talker, and emotion. Time-reversed utterances worsened NH listeners’ recognition of all emotions except sad, while upside-down F0 contours worsened that of angry and happy. Vocal emotion recognition with CI alone only degraded with time-reversed utterances. Time-reversed utterances worsened bimodal CI users’ recognition of angry and neutral, while upside-down F0 contours worsened that of angry and happy. Bimodal CI users and NH listeners were also affected by mean F0 and F0 range normalization when recognizing happy. We conclude that natural F0 contours should be faithfully encoded with CIs for better vocal emotion recognition.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.