The potential contribution of the peripheral auditory efferent system to our understanding of speech in a background of competing noise was studied using a computer model of the auditory periphery and assessed using an automatic speech recognition system. A previous study had shown that a fixed efferent attenuation applied to all channels of a multi-channel model could improve the recognition of connected digit triplets in noise [G. J. Brown, R. T. Ferry, and R. Meddis, J. Acoust. Soc. Am. 127, 943-954 (2010)]. In the current study an anatomically justified feedback loop was used to automatically regulate separate attenuation values for each auditory channel. This arrangement resulted in a further enhancement of speech recognition over fixed-attenuation conditions. Comparisons between multi-talker babble and pink noise interference conditions suggest that the benefit originates from the model's ability to modify the amount of suppression in each channel separately according to the spectral shape of the interfering sounds.
This study compares the phoneme recognition performance in speech-shaped noise of a microscopic model for speech recognition with the performance of normal-hearing listeners. "Microscopic" is defined in terms of this model twofold. First, the speech recognition rate is predicted on a phoneme-by-phoneme basis. Second, microscopic modeling means that the signal waveforms to be recognized are processed by mimicking elementary parts of human's auditory processing. The model is based on an approach by Holube and Kollmeier [J. Acoust. Soc. Am. 100, 1703-1716 (1996)] and consists of a psychoacoustically and physiologically motivated preprocessing and a simple dynamic-time-warp speech recognizer. The model is evaluated while presenting nonsense speech in a closed-set paradigm. Averaged phoneme recognition rates, specific phoneme recognition rates, and phoneme confusions are analyzed. The influence of different perceptual distance measures and of the model's a-priori knowledge is investigated. The results show that human performance can be predicted by this model using an optimal detector, i.e., identical speech waveforms for both training of the recognizer and testing. The best model performance is yielded by distance measures which focus mainly on small perceptual distances and neglect outliers.
This study compared spatial speech-in-noise performance in two cochlear implant (CI) patient groups: bimodal listeners, who use a hearing aid contralaterally to support their impaired acoustic hearing, and listeners with contralateral normal hearing, i.e., who were single-sided deaf before implantation. Using a laboratory setting that controls for head movements and that simulates spatial acoustic scenes, speech reception thresholds were measured for frontal speech-in-stationary noise from the front, the left, or the right side. Spatial release from masking (SRM) was then extracted from speech reception thresholds for monaural and binaural listening. SRM was found to be significantly lower in bimodal CI than in CI single-sided deaf listeners. Within each listener group, the SRM extracted from monaural listening did not differ from the SRM extracted from binaural listening. In contrast, a normal-hearing control group showed a significant improvement in SRM when using two ears in comparison to one. Neither CI group showed a binaural summation effect; that is, their performance was not improved by using two devices instead of the best monaural device in each spatial scenario. The results confirm a “listening with the better ear” strategy in the two CI patient groups, where patients benefited from using two ears/devices instead of one by selectively attending to the better one. Which one is the better ear, however, depends on the spatial scenario and on the individual configuration of hearing loss.
The influence of different sources of speech-intrinsic variation (speaking rate, effort, style and dialect or accent) on human speech perception was investigated. In listening experiments with 16 listeners, confusions of consonant-vowel-consonant (CVC) and vowel-consonant-vowel (VCV) sounds in speech-weighted noise were analyzed. Experiments were based on the OLLO logatome speech database, which was designed for a man-machine comparison. It contains utterances spoken by 50 speakers from five dialect/accent regions and covers several intrinsic variations. By comparing results depending on intrinsic and extrinsic variations (i.e., different levels of masking noise), the degradation induced by variabilities can be expressed in terms of the SNR. The spectral level distance between the respective speech segment and the long-term spectrum of the masking noise was found to be a good predictor for recognition rates, while phoneme confusions were influenced by the distance to spectrally close phonemes. An analysis based on transmitted information of articulatory features showed that voicing and manner of articulation are comparatively robust cues in the presence of intrinsic variations, whereas the coding of place is more degraded. The database and detailed results have been made available for comparisons between human speech recognition (HSR) and automatic speech recognizers (ASR).
This study investigated the speech intelligibility benefit of using two different spatial noise reduction algorithms in cochlear implant (CI) users who use a hearing aid (HA) on the contralateral side (bimodal CI users). The study controlled for head movements by using head-related impulse responses to simulate a realistic cafeteria scenario and controlled for HA and CI manufacturer differences by using the master hearing aid platform (MHA) to apply both hearing loss compensation and the noise reduction algorithms (beamformers). Ten bimodal CI users with moderate to severe hearing loss contralateral to their CI participated in the study, and data from nine listeners were included in the data analysis. The beamformers evaluated were the adaptive differential microphones (ADM) implemented independently on each side of the listener and the (binaurally implemented) minimum variance distortionless response (MVDR). For frontal speech and stationary noise from either left or right, an improvement (reduction) of the speech reception threshold of 5.4 dB and 5.5 dB was observed using the ADM, and 6.4 dB and 7.0 dB using the MVDR, respectively. As expected, no improvement was observed for either algorithm for colocated speech and noise. In a 20-talker babble noise scenario, the benefit observed was 3.5 dB for ADM and 7.5 dB for MVDR. The binaural MVDR algorithm outperformed the bilaterally applied monaural ADM. These results encourage the use of beamformer algorithms such as the ADM and MVDR by bimodal CI users in everyday life scenarios.
Bilateral cochlear implant (BCI) users only have very limited spatial hearing abilities. Speech coding strategies transmit interaural level differences (ILDs) but in a distorted manner. Interaural time difference (ITD) information transmission is even more limited. With these cues, most BCI users can coarsely localize a single source in quiet, but performance quickly declines in the presence of other sound. This proof-of-concept study presents a novel signal processing algorithm specific for BCIs, with the aim to improve sound localization in noise. The core part of the BCI algorithm duplicates a monophonic electrode pulse pattern and applies quasistationary natural or artificial ITDs or ILDs based on the estimated direction of the dominant source. Three experiments were conducted to evaluate different algorithm variants: Experiment 1 tested if ITD transmission alone enables BCI subjects to lateralize speech. Results showed that six out of nine BCI subjects were able to lateralize intelligible speech in quiet solely based on ITDs. Experiments 2 and 3 assessed azimuthal angle discrimination in noise with natural or modified ILDs and ITDs. Angle discrimination for frontal locations was possible with all variants, including the pure ITD case, but for lateral reference angles, it was only possible with a linearized ILD mapping. Speech intelligibility in noise, limitations, and challenges of this interaural cue transmission approach are discussed alongside suggestions for modifying and further improving the BCI algorithm.
Computer models of the auditory periphery provide a tool for formulating theories concerning the relationship between the physiology of the auditory system and the perception of sounds both in normal and impaired hearing. However, the time-consuming nature of their construction constitutes a major impediment to their use, and it is important that transparent models be available on an 'off-the-shelf' basis to researchers. The MATLAB Auditory Periphery (MAP) model aims to meet these requirements and be freely available. The model can be used to simulate simple psychophysical tasks such as absolute threshold, pitch matching and forward masking and those used to measure compression and frequency selectivity. It can be used as a front end to automatic speech recognisers for the study of speech in quiet and in noise. The model can also simulate theories of hearing impairment and be used to make predictions about the efficacy of hearing aids. The use of the software will be described along with illustrations of its application in the study of the psychology of hearing.
The relation of the individual speech-in-noise performance differences in cochlear implant (CI) users to underlying physiological factors is currently poorly understood. This study approached this research question by a step-wise individualization of a computer model of speech intelligibility mimicking the details of CI signal processing and some details of the physiology present in CI users. Two factors, the electrical field spatial spread and internal noise (as a coarse model of the individual cognitive performance) were incorporated. Internal representations of speech-in-noise mixtures calculated by the model were classified using an automatic speech recognizer backend employing Hidden Markov Models with a Gaussian probability distribution. One-dimensional electric field spatial spread functions were inferred from electrical field imaging data of 14 CI users. Simplified assumptions of homogenously distributed auditory nerve fibers along the cochlear array and equal distance between electrode array and nerve tissue were assumed in the model. Internal noise, whose standard deviation was adjusted based on either anamnesis data, or text-reception-threshold data, or a combination thereof, was applied to the internal representations before classification. A systematic model evaluation showed that predicted speech-reception-thresholds (SRTs) in stationary noise improved (decreased) with decreasing internal noise standard deviation and with narrower electric field spatial spreads. The model version that was individualized to actual listeners using internal noise alone (containing average spatial spread) showed significant correlations to measured SRTs, reflecting the high correlation of the text-reception threshold data with SRTs. However, neither individualization to spatial spread functions alone, nor a combined individualization based on spatial spread functions and internal noise standard deviation did produce significant correlations with measured SRTs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.