Identification of target speech was studied under masked conditions consisting of two or four independent speech maskers. In the reference conditions, the maskers were colocated with the target, the masker talkers were the same sex as the target, and the masker speech was intelligible. The comparison conditions, intended to provide release from masking, included different-sex target and masker talkers, time-reversal of the masker speech, and spatial separation of the maskers from the target. Significant release from masking was found for all comparison conditions. To determine whether these reductions in masking could be attributed to differences in energetic masking, ideal time-frequency segregation (ITFS) processing was applied so that the time-frequency units where the masker energy dominated the target energy were removed. The remaining target-dominated “glimpses” were reassembled as the stimulus. Speech reception thresholds measured using these resynthesized ITFS-processed stimuli were the same for the reference and comparison conditions supporting the conclusion that the amount of energetic masking across conditions was the same. These results indicated that the large release from masking found under all comparison conditions was due primarily to a reduction in informational masking. Furthermore, the large individual differences observed generally were correlated across the three masking release conditions.
In a previous study it was shown that an on-frequency precursor intended to activate the medial olivocochlear reflex ͑MOCR͒ at the signal frequency reduces the gain estimated from growth-of-masking ͑GOM͒ functions. This is called the temporal effect ͑TE͒. In Expt. 1 a shorter method of measuring this change in gain is established. GOM functions were measured with an onand off-frequency precursor presented before the masker and signal, and used to estimate Input/ Output functions. The change in gain estimated in this way was very similar to that estimated from comparing two points measured with a single fixed masker level on the lower legs of the GOM functions. In Expt. 2, the TE was measured as a function of precursor duration and signal delay. For short precursor durations and short delays the TE increased ͑buildup͒ or remained constant as delay increased, then decreased. The TE also increased with precursor duration for the shortest delay. The results were fitted with a model based on the time course of the MOCR. The model fitted the data well, and predicted the buildup. This buildup is not consistent with exponential decay predicted by neural adaptation or persistence of excitation.
Speech perception in complex sound fields can greatly benefit from different unmasking cues to segregate the target from interfering voices. This study investigated the role of three unmasking cues (spatial separation, gender differences, and masker time reversal) on speech intelligibility and perceived listening effort in normal-hearing listeners. Speech intelligibility and categorically scaled listening effort were measured for a female target talker masked by two competing talkers with no unmasking cues or one to three unmasking cues. In addition to natural stimuli, all measurements were also conducted with glimpsed speech-which was created by removing the time-frequency tiles of the speech mixture in which the maskers dominated the mixture-to estimate the relative amounts of informational and energetic masking as well as the effort associated with source segregation. The results showed that all unmasking cues as well as glimpsing improved intelligibility and reduced listening effort and that providing more than one cue was beneficial in overcoming informational masking. The reduction in listening effort due to glimpsing corresponded to increases in signal-to-noise ratio of 8 to 18 dB, indicating that a significant amount of listening effort was devoted to segregating the target from the maskers. Furthermore, the benefit in listening effort for all unmasking cues extended well into the range of positive signal-to-noise ratios at which speech intelligibility was at ceiling, suggesting that listening effort is a useful tool for evaluating speech-on-speech masking conditions at typical conversational levels.
The aim of this study was to evaluate the performance of a visually guided hearing aid (VGHA) under conditions designed to capture some aspects of “real-world” communication settings. The VGHA uses eye gaze to steer the acoustic look direction of a highly directional beamforming microphone array. Although the VGHA has been shown to enhance speech intelligibility for fixed-location, frontal targets, it is currently not known whether these benefits persist in the face of frequent changes in location of the target talker that are typical of conversational turn-taking. Participants were 14 young adults, 7 with normal hearing and 7 with bilateral sensorineural hearing impairment. Target stimuli were sequences of 12 question–answer pairs that were embedded in a mixture of competing conversations. The participant’s task was to respond via a key press after each answer indicating whether it was correct or not. Spatialization of the stimuli and microphone array processing were done offline using recorded impulse responses, before presentation over headphones. The look direction of the array was steered according to the eye movements of the participant as they followed a visual cue presented on a widescreen monitor. Performance was compared for a “dynamic” condition in which the target stimulus moved between three locations, and a “fixed” condition with a single target location. The benefits of the VGHA over natural binaural listening observed in the fixed condition were reduced in the dynamic condition, largely because visual fixation was less accurate.
In many situations, listeners with sensorineural hearing loss demonstrate reduced spatial release from masking compared to listeners with normal hearing. This deficit is particularly evident in the "symmetric masker" paradigm in which competing talkers are located to either side of a central target talker. However, there is some evidence that reduced target audibility (rather than a spatial deficit per se) under conditions of spatial separation may contribute to the observed deficit. In this study a simple "glimpsing" model (applied separately to each ear) was used to isolate the target information that is potentially available in binaural speech mixtures. Intelligibility of these glimpsed stimuli was then measured directly. Differences between normally hearing and hearingimpaired listeners observed in the natural binaural condition persisted for the glimpsed condition, despite the fact that the task no longer required segregation or spatial processing. This result is consistent with the idea that the performance of listeners with hearing loss in the spatialized mixture was limited by their ability to identify the target speech based on sparse glimpses, possibly as a result of some of those glimpses being inaudible.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.