Objectives: Speech intelligibility in the presence of a competing talker (speech-on-speech; SoS) presents more difficulties for cochlear implant (CI) users compared with normal-hearing listeners. A recent study implied that these difficulties may be related to CI users’ low sensitivity to two fundamental voice cues, namely, the fundamental frequency (F0) and the vocal tract length (VTL) of the speaker. Because of the limited spectral resolution in the implant, important spectral cues carrying F0 and VTL information are expected to be distorted. This study aims to address two questions: (1) whether spectral contrast enhancement (SCE), previously shown to enhance CI users’ speech intelligibility in the presence of steady state background noise, could also improve CI users’ SoS intelligibility, and (2) whether such improvements in SoS from SCE processing are due to enhancements in CI users’ sensitivity to F0 and VTL differences between the competing talkers. Design: The effect of SCE on SoS intelligibility and comprehension was measured in two separate tasks in a sample of 14 CI users with Cochlear devices. In the first task, the CI users were asked to repeat the sentence spoken by the target speaker in the presence of a single competing talker. The competing talker was the same target speaker whose F0 and VTL were parametrically manipulated to obtain the different experimental conditions. SoS intelligibility, in terms of the percentage of correctly repeated words from the target sentence, was assessed using the standard advanced combination encoder (ACE) strategy and SCE for each voice condition. In the second task, SoS comprehension accuracy and response times were measured using the same experimental setup as in the first task, but with a different corpus. In the final task, CI users’ sensitivity to F0 and VTL differences were measured for the ACE and SCE strategies. The benefit in F0 and VTL discrimination from SCE processing was evaluated with respect to the improvement in SoS perception from SCE. Results: While SCE demonstrated the potential of improving SoS intelligibility in CI users, this effect appeared to stem from SCE improving the overall signal to noise ratio in SoS rather than improving the sensitivity to the underlying F0 and VTL differences. A second key finding of this study was that, contrary to what has been observed in a previous study for childlike voice manipulations, F0 and VTL manipulations of a reference female speaker (target speaker) toward male-like voices provided a small but significant release from masking for the CI users tested. Conclusions: The present findings, together with those previously reported in the literature, indicate that SCE could serve as a possible background-noise-reduction strategy in commercial CI speech processors that could enhance speech intelligibility especially in the presence of background talkers that have longer VTLs compared with the target speaker.
Frequency selectivity can be quantified using masking paradigms, such as psychophysical tuning curves (PTCs). Normal-hearing (NH) listeners show sharp PTCs that are level- and frequency-dependent, whereas frequency selectivity is strongly reduced in cochlear implant (CI) users. This study aims at (a) assessing individual shapes of PTCs in CI users, (b) comparing these shapes to those of simulated CI listeners (NH listeners hearing through a CI simulation), and (c) increasing the sharpness of PTCs using a biologically inspired dynamic compression algorithm, BioAid, which has been shown to sharpen the PTC shape in hearing-impaired listeners. A three-alternative-forced-choice forward-masking technique was used to assess PTCs in 8 CI users (with their own speech processor) and 11 NH listeners (with and without listening through a vocoder to simulate electric hearing). CI users showed flat PTCs with large interindividual variability in shape, whereas simulated CI listeners had PTCs of the same average flatness, but more homogeneous shapes across listeners. The algorithm BioAid was used to process the stimuli before entering the CI users’ speech processor or the vocoder simulation. This algorithm was able to partially restore frequency selectivity in both groups, particularly in seven out of eight CI users, meaning significantly sharper PTCs than in the unprocessed condition. The results indicate that algorithms can improve the large-scale sharpness of frequency selectivity in some CI users. This finding may be useful for the design of sound coding strategies particularly for situations in which high frequency selectivity is desired, such as for music perception.
Objectives The relationship between electrode-nerve interface (ENI) estimates and inter-subject differences in speech performance with sequential and simultaneous channel stimulation in adult cochlear implant listeners were explored. We investigated the hypothesis that individuals with good ENIs would perform better with simultaneous compared to sequential channel stimulation speech processing strategies than those estimated to have poor ENIs. Methods Fourteen postlingually deaf implanted cochlear implant users participated in the study. Speech understanding was assessed with a sentence test at signal-to-noise ratios that resulted in 50% performance for each user with the baseline strategy F120 Sequential. Two simultaneous stimulation strategies with either two (Paired) or three sets of virtual channels (Triplet) were tested at the same signal-to-noise ratio. ENI measures were estimated through: (I) voltage spread with electrical field imaging, (II) behavioral detection thresholds with focused stimulation, and (III) slope (IPG slope effect) and 50%-point differences (dB offset effect) of amplitude growth functions from electrically evoked compound action potentials with two interphase gaps. Results A significant effect of strategy on speech understanding performance was found, with Triplets showing a trend towards worse speech understanding performance than sequential stimulation. Focused thresholds correlated positively with the difference required to reach most comfortable level (MCL) between Sequential and Triplet strategies, an indirect measure of channel interaction. A significant offset effect (difference in dB between 50%-point for higher eCAP growth function slopes with two IPGs) was observed. No significant correlation was observed between the slopes for the two IPGs tested. None of the measures used in this study correlated with the differences in speech understanding scores between strategies. Conclusions The ENI measure based on behavioral focused thresholds could explain some of the difference in MCLs, but none of the ENI measures could explain the decrease in speech understanding with increasing pairs of simultaneously stimulated electrodes in processing strategies.
Cochlear implant (CI) sound processing typically uses a front-end automatic gain control (AGC), reducing the acoustic dynamic range (DR) to control the output level and protect the signal processing against large amplitude changes. It can also introduce distortions into the signal and does not allow a direct mapping between acoustic input and electric output. For speech in noise, a reduction in DR can result in lower speech intelligibility due to compressed modulations of speech. This study proposes to implement a CI signal processing scheme consisting of a full acoustic DR with adaptive properties to improve the signal-to-noise ratio and overall speech intelligibility. Measurements based on the Short-Time Objective Intelligibility measure and an electrodogram analysis, as well as behavioral tests in up to 10 CI users, were used to compare performance with a single-channel, dual-loop, front-end AGC and with an adaptive back-end multiband dynamic compensation system (Voice Guard [VG]). Speech intelligibility in quiet and at a +10 dB signal-to-noise ratio was assessed with the Hochmair–Schulz–Moser sentence test. A logatome discrimination task with different consonants was performed in quiet. Speech intelligibility was significantly higher in quiet for VG than for AGC, but intelligibility was similar in noise. Participants obtained significantly better scores with VG than AGC in the logatome discrimination task. The objective measurements predicted significantly better performance estimates for VG. Overall, a dynamic compensation system can outperform a single-stage compression (AGC + linear compression) for speech perception in quiet.
Speech intelligibility in multitalker settings is challenging for most cochlear implant (CI) users. One possibility for this limitation is the suboptimal representation of vocal cues in implant processing, such as the fundamental frequency (F0), and the vocal tract length (VTL). Previous studies suggested that while F0 perception depends on spectrotemporal cues, VTL perception relies largely on spectral cues. To investigate how spectral smearing in CIs affects vocal cue perception in speech-on-speech (SoS) settings, adjacent electrodes were simultaneously stimulated using current steering in 12 Advanced Bionics users to simulate channel interaction. In current steering, two adjacent electrodes are simultaneously stimulated forming a channel of parallel stimulation. Three such stimulation patterns were used: Sequential (one current steering channel), Paired (two channels), and Triplet stimulation (three channels). F0 and VTL just-noticeable differences (JNDs; Task 1), in addition to SoS intelligibility (Task 2) and comprehension (Task 3), were measured for each stimulation strategy. In Tasks 2 and 3, four maskers were used: the same female talker, a male voice obtained by manipulating both F0 and VTL (F0+VTL) of the original female speaker, a voice where only F0 was manipulated, and a voice where only VTL was manipulated. JNDs were measured relative to the original voice for the F0, VTL, and F0+VTL manipulations. When spectral smearing was increased from Sequential to Triplet, a significant deterioration in performance was observed for Tasks 1 and 2, with no differences between Sequential and Paired stimulation. Data from Task 3 were inconclusive. These results imply that CI users may tolerate certain amounts of channel interaction without significant reduction in performance on tasks relying on voice perception. This points to possibilities for using parallel stimulation in CIs for reducing power consumption.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.