The cochlea plays a key role in the transmission from acoustic vibration to neural stimulation upon which the brain perceives the sound. A cochlear implant (CI) is an auditory prosthesis to replace the damaged cochlear hair cells to achieve acoustic-to-neural conversion. However, the CI is a very coarse bionic imitation of the normal cochlea. The highly resolved time-frequency-intensity information transmitted by the normal cochlea, which is vital to high-quality auditory perception such as speech perception in challenging environments, cannot be guaranteed by CIs. Although CI recipients with state-of-the-art commercial CI devices achieve good speech perception in quiet backgrounds, they usually suffer from poor speech perception in noisy environments. Therefore, noise suppression or speech enhancement (SE) is one of the most important technologies for CI. In this study, we introduce recent progress in deep learning (DL), mostly neural networks (NN)-based SE front ends to CI, and discuss how the hearing properties of the CI recipients could be utilized to optimize the DL-based SE. In particular, different loss functions are introduced to supervise the NN training, and a set of objective and subjective experiments is presented. Results verify that the CI recipients are more sensitive to the residual noise than the SE-induced speech distortion, which has been common knowledge in CI research. Furthermore, speech reception threshold (SRT) in noise tests demonstrates that the intelligibility of the denoised speech can be significantly improved when the NN is trained with a loss function bias to more noise suppression than that with equal attention on noise residue and speech distortion.
Signal processing strategies in most clinical cochlear implants (CIs) extract and transmit speech envelopes to stimulate the auditory neurons. The incomplete representation of the rich fine structures in speech has significantly degraded the CI recipients' ability in highlevel perception, including their speech understanding in noise. This paper presents a noise-robust signal processing strategy to deal with this problem. Neural networks (NN) are built and trained to simulate the advanced combination encoder (ACE, a strategy for CI products of Cochlear Corporation). The NN-based ACE (namely, NNACE) is trained with a sophisticatedly designed loss function to output envelope-like signals that 1) is compatible with ACE-based CI system and can serve as the modulator to generate the electric stimuli, 2) is more noise-robust, and 3) might bear a certain degree of the temporal fine structures of speech. Subjective and objective evaluations with vocoder simulated speech show that NNACE outperforms the other methods and further actual CI experiments are warranted.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.