In the presence of competing speech or noise, reverberation degrades speech intelligibility not only by its direct effect on the target but also by affecting the interferer. Two experiments were designed to validate a method for predicting the loss of intelligibility associated with this latter effect. Speech reception thresholds were measured under headphones, using spatially separated target sentences and speech-shaped noise interferers simulated in virtual rooms. To investigate the effect of reverberation on the interferer unambiguously, the target was always anechoic. The interferer was placed in rooms with different sizes and absorptions, and at different distances and azimuths from the listener. The interaural coherence of the interferer did not fully predict the effect of reverberation. The azimuth separation of the sources and the coloration introduced by the room also had to be taken into account. The binaural effects were modeled by computing the binaural masking level differences in the studied configurations, the monaural effects were predicted from the excitation pattern of the noises, and speech intelligibility index weightings were applied to both. These parameters were all calculated from the room impulse responses convolved with noise. A 0.95-0.97 correlation was obtained between the speech reception thresholds and their predicted value.
When speech is in competition with interfering sources in rooms, monaural indicators of intelligibility fail to take account of the listener's abilities to separate target speech from interfering sounds using the binaural system. In order to incorporate these segregation abilities and their susceptibility to reverberation, Lavandier and Culling [J. Acoust. Soc. Am. 127, 387-399 (2010)] proposed a model which combines effects of better-ear listening and binaural unmasking. A computationally efficient version of this model is evaluated here under more realistic conditions that include head shadow, multiple stationary noise sources, and real-room acoustics. Three experiments are presented in which speech reception thresholds were measured in the presence of one to three interferers using real-room listening over headphones, simulated by convolving anechoic stimuli with binaural room impulse-responses measured with dummy-head transducers in five rooms. Without fitting any parameter of the model, there was close correspondence between measured and predicted differences in threshold across all tested conditions. The model's components of better-ear listening and binaural unmasking were validated both in isolation and in combination. The computational efficiency of this prediction method allows the generation of complex "intelligibility maps" from room designs.
Sound externalization, or the perception that a sound source is outside of the head, is an intriguing phenomenon that has long interested psychoacousticians. While previous reviews are available, the past few decades have produced a substantial amount of new data.In this review, we aim to synthesize those data and to summarize advances in our understanding of the phenomenon. We also discuss issues related to the definition and measurement of sound externalization and describe quantitative approaches that have been taken to predict the outcomes of externalization experiments. Last, sound externalization is of practical importance for many kinds of hearing technologies. Here, we touch on two examples, discussing the role of sound externalization in augmented/virtual reality systems and bringing attention to the somewhat overlooked issue of sound externalization in wearers of hearing aids.
This study investigated the dimensions underlying perceived differences between loudspeakers. Listeners compared the sound reproduction of 12 loudspeakers in a room, using three musical excerpts. For the loudspeakers to be compared one just after the other in exactly the same conditions, the sounds radiated by the loudspeakers were recorded in a listening room, and the recorded sounds were submitted to paired comparisons using headphones. The resulting perceptual dissimilarities were analyzed by using a multidimensional scaling technique, revealing two main perceptual dimensions used by listeners to discriminate the loudspeakers. These dimensions were identical for the three musical excerpts. As the signals heard by listeners were directly accessible, they were used to define acoustical attributes describing the perceptual dimensions. Instead of arbitrarily choosing one acoustical analysis to define these attributes, several analyses were compared. The temporal, spectral, and time-frequency domains were investigated, and different auditory models were tested. These auditory models allowed the best description of the differences perceived by listeners, and were used to define two acoustical attributes describing our perceptual dimensions: the bass/treble balance and the medium emergence.
Cubick and Dau [(2016). Acta Acust. Acust. 102, 547–557] showed that speech reception thresholds (SRTs) in noise, obtained with normal-hearing listeners, were significantly higher with hearing aids (HAs) than without. Some listeners reported a change in their spatial perception of the stimuli due to the HA processing, with auditory images often being broader and closer to the head or even internalized. The current study investigated whether worse speech intelligibility with HAs might be explained by distorted spatial perception and the resulting reduced ability to spatially segregate the target speech from the interferers. SRTs were measured in normal-hearing listeners with or without HAs in the presence of three interfering talkers or speech-shaped noises. Furthermore, listeners were asked to sketch their spatial perception of the acoustic scene. Consistent with the previous study, SRTs increased with HAs. Spatial release from masking was lower with HAs than without. The effects were similar for noise and speech maskers and appeared to be accounted for by changes to energetic masking. This interpretation was supported by results from a binaural speech intelligibility model. Even though the sketches indicated a change of spatial perception with HAs, no direct link between spatial perception and segregation of talkers could be shown.
Four experiments investigated the effects on speech intelligibility of reverberation, sound source locations, and amplitude modulation of the interferers. Speech reception thresholds (SRTs) were measured using headphones and stimuli that simulated real-room listening, considering one or two interferers which were stationary or speech-modulated noises. In experiment 1, SRTs for modulated noises showed little variation with increasing interferer reverberation. Reverberation might have increased masking by filling in the modulated noise gaps, but simultaneously changed the noise spectra making them less effective maskers. In experiment 2, SRTs were lower when measured using a unique one-voice modulated interferer rather than a different interferer for each target sentence, suggesting that listeners could take advantage of the predictability of the interferer gaps. In experiment 3, increasing speech reverberation did not significantly affect the difference of SRTs measured with stationary and modulated noises, indicating that the ability to exploit noise modulations was still useful for temporally smeared speech. In experiment 4, spatial unmasking remained constant when applying modulations to the interferers, suggesting an independence of the abilities to exploit these modulations and the spatial separation of sources. Finally, a model predicting binaural intelligibility for modulated noises was developed and provided a good fit to the experimental data.
Sounds presented over headphones are generally perceived as internalized, i.e., originating from a source inside the head. Prior filtering by binaural room impulse responses (BRIRs) can create externalized sources. Previous studies concluded that these BRIRs need to be listener-specific to produce good externalization; however, listeners were generally facing a loudspeaker and asked to rate externalization relative to that loudspeaker, meaning that the source had to be perceived outside the head and also at the right distance. The present study investigated externalization when there is no visual source to match. Overall, lateral sources were perceived as more externalized than frontal sources. Experiment 1 showed that the perceived externalization obtained with non-individualized BRIRs measured in three different rooms was similar to that obtained with a state-of-the-art simulation using individualized BRIRs. Experiment 2 indicated that when there is no real source spectrum to match, headphone equalization does not improve externalization. Experiment 3 further showed that reverberation improved externalization only when it introduced interaural differences. Correlation analyses finally showed a close correspondence between perceived externalization and binaural cues (especially interaural coherence).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.