The severe hearing loss problems that some people suffer can be treated by providing them with a surgically implanted electrical device called cochlear implant (CI). CI users struggle to perceive complex audio signals such as music; however, previous studies show that CI recipients find music more enjoyable when the vocals are enhanced with respect to the background music. In this manuscript source separation (SS) algorithms are used to remix pop songs by applying gain to the lead singing voice. This work uses deep convolutional auto-encoders, a deep recurrent neural network, a multilayer perceptron (MLP), and non-negative matrix factorization to be evaluated objectively and subjectively through two different perceptual experiments which involve normal hearing subjects and CI recipients. The evaluation assesses the relevance of the artifacts introduced by the SS algorithms considering their computation time, as this study aims at proposing one of the algorithms for real-time implementation. Results show that the MLP performs in a robust way throughout the tested data while providing levels of distortions and artifacts which are not perceived by CI users. Thus, an MLP is proposed to be implemented for real-time monaural audio SS to remix music for CI users.
A cochlear implant (CI) is a surgically implanted electronic device that partially restores hearing to people suffering from profound hearing loss. Although CI users, in general, obtain a very good reception of continuous speech in the absence of background noise, they face severe limitations in the context of music perception and appreciation. The main reasons for these limitations are related to channel interactions created by the broad spread of electrical fields in the cochlea and to the low number of electrodes that stimulate it. Moreover, CIs have severe limitations when it comes to transmitting the temporal fine structure of acoustic signals, and hence, these devices elicit poor pitch and timber perception. For these reasons, several signal processing algorithms have been proposed to make music more accessible for CI users, trying to reduce the complexity of music signals or remixing them to enhance certain components, such as the lead singing voice. In this work, a deep neural network that performs real-time audio source separation to remix music for CI users is presented. The implementation is based on multi-layer perception (MLP) and has been evaluated using objective instrumental measurements to ensure clean source estimation. Furthermore, experiments in 10 normal hearing (NH) and 13 CI users to investigate how the vocals to instruments ratio (VIR) set by the tested listeners were affected in realistic environments with and without visual information. The objective instrumental results fulfill the benchmark reported in previous studies by introducing distortions that are shown to not be perceived by CI users. Moreover, the implemented model was optimized to perform real-time source separation. The experimental results show that CI users prefer vocals 8 dB enhanced with the respect to the instruments independent of acoustic sound scenarios and visual information. In contrast, NH listeners did not prefer a VIR different than zero dB.
Cochlear implants (CIs) have proven to be successful at restoring the sensation of hearing in people who suffer from profound sensorineural hearing loss. CI users generally achieve good speech understanding in quiet acoustic conditions. However, their ability to understand speech degrades drastically when background interfering noise is present. To address this problem, current CI systems are delivered with front-end speech enhancement modules that can aid the listener in noisy environments. However, these only perform well under certain noisy conditions, leaving quite some room for improvement in more challenging circumstances. In this work, we propose replacing the CI sound coding strategy with a deep neural network (DNN) that performs end-to-end speech denoising by taking the raw audio as input and providing a denoised electrodogram, i.e., the electrical stimulation patterns applied to the electrodes across time. We specifically introduce a DNN that emulates a common CI sound coding strategy, the advanced combination encoder (ACE). We refer to the proposed algorithm as 'Deep ACE'. Deep ACE is designed not only to accurately code the acoustic signals in the same way that ACE would but also to automatically remove unwanted interfering noises, without sacrificing processing latency. The model was optimized using a CI-specific loss function and evaluated using objective measures as well as listening tests in CI participants. Results show that, based on objective measures, the proposed model achieved higher scores when compared to the baseline algorithms. Also, the proposed deep learning-based sound coding strategy gave eight CI users the highest speech intelligibility results.
Wireless transmission of audio from or to signal processors of cochlear implants (CIs) is used to improve speech understanding of CI users. This transmission requires wireless communication to exchange the necessary data. Because they are battery powered devices, energy consumption needs to be kept low in CIs, therefore making bitrate reduction of the audio signals necessary. Additionally, low latency is essential. Previously, a codec for the electrodograms of CIs, called the Electrocodec, was proposed. In this work, a subjective evaluation of the Electrocodec is presented, which investigates the impact of the codec on monaural speech performance. The Electrocodec is evaluated with respect to speech recognition and quality in ten CI users and compared to the Opus audio codec. Opus is a low latency and low bitrate audio codec that best met the CI requirements in terms of bandwidth, bitrate, and latency. Achieving equal speech recognition and quality as Opus, the Electrocodec achieves lower mean bitrates than Opus. Actual rates vary from 24.3 up to 53.5 kbit/s, depending on the codec settings. While Opus has a minimum algorithmic latency of 5 ms, the Electrocodec has an algorithmic latency of 0 ms.
Normal hearing listeners have the ability to exploit the audio input perceived by each ear to extract target information in challenging listening scenarios. Bilateral cochlear implant (BiCI) users, however, do not benefit as much as normal hearing listeners do from a bilateral input. In this study, we investigate the effect that bilaterally linked band selection, bilaterally synchronized electrical stimulation and ideal binary masks (IdBMs) have on the ability of 10 BiCIs to understand speech in background noise. The performance was assessed through a sentence-based speech intelligibility test, in a scenario where the speech signal was presented from the front and the interfering noise from one side. The linked band selection relies on the most favorable signal-to-noise-ratio (SNR) ear, which will select the bands to be stimulated for both CIs. Results show that no benefit from adding a second CI to the most favorable SNR side was achieved for any of the tested bilateral conditions. However, when using both devices, speech perception results show that performing linked band selection, besides delivering bilaterally synchronized electrical stimulation, leads to an improvement compared to standard clinical setups. Moreover, the outcomes of this work show that by applying IdBMs, subjects achieve speech intelligibility scores similar to the ones without background noise.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.