Although most recent multitalker research has emphasized the importance of binaural cues, monaural cues can play an equally important role in the perception of multiple simultaneous speech signals. In this experiment, the intelligibility of a target phrase masked by a single competing masker phrase was measured as a function of signal-to-noise ratio (SNR) with same-talker, same-sex, and different-sex target and masker voices. The results indicate that informational masking, rather than energetic masking, dominated performance in this experiment. The amount of masking was highly dependent on the similarity of the target and masker voices: performance was best when different-sex talkers were used and worst when the same talker was used for target and masker. Performance did not, however, improve monotonically with increasing SNR. Intelligibility generally plateaued at SNRs below 0 dB and, in some cases, intensity differences between the target and masking voices produced substantial improvements in performance with decreasing SNR. The results indicate that informational and energetic masking play substantially different roles in the perception of competing speech messages.
Although many researchers have examined the role that binaural cues play in the perception of spatially separated speech signals, relatively little is known about the cues that listeners use to segregate competing speech messages in a monaural or diotic stimulus. This series of experiments examined how variations in the relative levels and voice characteristics of the target and masking talkers influence a listener's ability to extract information from a target phrase in a 3-talker or 4-talker diotic stimulus. Performance in this speech perception task decreased systematically when the level of the target talker was reduced relative to the masking talkers. Performance also generally decreased when the target and masking talkers had similar voice characteristics: the target phrase was most intelligible when the target and masking phrases were spoken by different-sex talkers, and least intelligible when the target and masking phrases were spoken by the same talker. However, when the target-to-masker ratio was less than 3 dB, overall performance was usually lower with one different-sex masker than with all same-sex maskers. In most of the conditions tested, the listeners performed better when they were exposed to the characteristics of the target voice prior to the presentation of the stimulus. The results of these experiments demonstrate how monaural factors may play an important role in the segregation of speech signals in multitalker environments.
When a target speech signal is obscured by an interfering speech wave form, comprehension of the target message depends both on the successful detection of the energy from the target speech wave form and on the successful extraction and recognition of the spectro-temporal energy pattern of the target out of a background of acoustically similar masker sounds. This study attempted to isolate the effects that energetic masking, defined as the loss of detectable target information due to the spectral overlap of the target and masking signals, has on multitalker speech perception. This was achieved through the use of ideal time-frequency binary masks that retained those spectro-temporal regions of the acoustic mixture that were dominated by the target speech but eliminated those regions that were dominated by the interfering speech. The results suggest that energetic masking plays a relatively small role in the overall masking that occurs when speech is masked by interfering speech but a much more significant role when speech is masked by interfering noise.
Although researchers have long recognized the unique properties of the head-related transfer function (HRTF) for nearby sources (within 1 m of the listener's head), virtually all of the HRTF measurements described in the literature have focused on source locations 1 m or farther from the listener. In this study, HRTFs for sources at distances from 0.12 to 1 m were calculated using a rigid-sphere model of the head and measured using a Knowles Electronic Manikin for Acoustic Research (KEMAR) and an acoustic point source. Both the calculations and the measurements indicate that the interaural level difference (ILD) increases substantially for lateral sources as distance decreases below 1 m, even at low frequencies where the ILD is small for distant sources. In contrast, the interaural time delay (ITD) is roughly independent of distance even when the source is close. The KEMAR measurements indicate that the direction of the source relative to the outer ear plays an important role in determining the high-frequency response of the HRTF in the horizontal plane. However, the elevation-dependent characteristics of the HRTFs are not strongly dependent on distance, and the contribution of the pinna to the HRTF is independent of distance beyond a few centimeters from the ear. Overall, the results suggest that binaural cues play an important role in auditory distance perception for nearby sources.
Three experiments used the Coordinated Response Measure task to examine the roles that differences in F0 and differences in vocal-tract length have on the ability to attend to one of two simultaneous speech signals. The first experiment asked how increases in the natural F0 difference between two sentences (originally spoken by the same talker) affected listeners' ability to attend to one of the sentences. The second experiment used differences in vocal-tract length, and the third used both F0 and vocal-tract length differences. Differences in F0 greater than 2 semitones produced systematic improvements in performance. Differences in vocal-tract length produced systematic improvements in performance when the ratio of lengths was 1.08 or greater, particularly when the shorter vocal tract belonged to the target talker. Neither of these manipulations produced improvements in performance as great as those produced by a different-sex talker. Systematic changes in both F0 and vocal-tract length that simulated an incremental shift in gender produced substantially larger improvements in performance than did differences in F0 or vocal-tract length alone. In general, shifting one of two utterances spoken by a female voice towards a male voice produces a greater improvement in performance than shifting male towards female. The increase in performance varied with the intonation patterns of individual talkers, being smallest for those talkers who showed most variability in their intonation patterns between different utterances.
Although many researchers have examined auditory localization for relatively distant sound sources, little is known about the spatial perception of nearby sources. In the region within 1 m of a listener's head, defined as the "proximal region," the interaural level difference increases dramatically as the source approaches the head, while the interaural time delay is roughly independent of distance. An experiment has been performed to evaluate proximal-region localization performance. An auditory point source was moved to a random position within 1 m of the subject's head, and the subject responded by pointing to the perceived location of the sound with an electromagnetic position sensor. The overall angular error (17 degrees) was roughly comparable to previously measured results in distal-region experiments. Azimuth error increased slightly as the sound source approached the head, but elevation performance was essentially independent of source distance. Distance localization performance was generally better than has been reported in distal-region experiments and was strongly dependent on azimuth, with the stimulus-response correlation ranging from 0.85 to the side of the head to less than 0.4 in the median plane. The results suggest that the enlarged binaural difference cues found in the head-related transfer function (HRTF) for nearby sources are important to auditory distance perception in the proximal region.
The focus of this study was the release from informational masking that could be obtained in a speech task by viewing a video of the target talker. A closed-set speech recognition paradigm was used to measure informational masking in 23 children (ages 6-16 years) and 10 adults. An audioonly condition required attention to a monaural target speech message that was presented to the same ear with a time-synchronized distracter message. In an audiovisual condition, a synchronized video of the target talker was also presented to assess the release from informational masking that could be achieved by speechreading. Children required higher target/distracter ratios than adults to reach comparable performance levels in the audio-only condition, reflecting a greater extent of informational masking in these listeners. There was a monotonic age effect, such that even the children in the oldest age group (12-16.9 years) demonstrated performance somewhat poorer than adults. Older children and adults improved significantly in the audiovisual condition, producing a release from informational masking of 15 dB or more in some adult listeners. Audiovisual presentation produced no informational masking release for the youngest children. Across all ages, the benefit of a synchronized video was strongly associated with speechreading ability.
Objectives Listening to speech with multiple competing talkers requires the perceptual separation of the target voice from the interfering background. Normal-hearing (NH) listeners are able to take advantage of perceived differences in the spatial locations of competing sound sources to facilitate this process. Previous research suggests that bilateral (BI) cochlear-implant (CI) listeners cannot do so, and it is unknown whether single-sided deaf CI users (SSD-CI; one acoustic and one CI ear) have this ability. This study investigated whether providing a second ear via cochlear implantation can facilitate the perceptual separation of targets and interferers in a listening situation involving multiple competing talkers. Design BI-CI and SSD-CI listeners were required to identify speech from a target talker mixed with one or two interfering talkers. In the baseline monaural condition, the target speech and the interferers were presented to one of the CIs (for the BI-CI listeners) or to the acoustic ear (for the SSD-CI listeners). In the bilateral condition, the target was still presented to the first ear but the interferers were presented to both the target ear and the listener's second ear (always a CI), thereby testing whether CI listeners could use information about the interferer obtained from a second ear to facilitate perceptual separation of the target and interferer. Results Presenting a copy of the interfering signals to the second ear improved performance, up to 4-5 dB (12-18 percentage points), but the amount of improvement depended on the type of interferer. For BI-CI listeners, the improvement occurred mainly in conditions involving one interfering talker, regardless of gender. For SSD-CI listeners, the improvement occurred in conditions involving one or two interfering talkers of the same gender as the target. This interaction is consistent with the idea that the SSD-CI listeners had access to pitch cues in their NH ear to separate the opposite-gender target and interferers, while the BI-CI listeners did not. Conclusions These results suggest that a second auditory input via a CI can facilitate the perceptual separation of competing talkers in situations where monaural cues are insufficient to do so, thus partially restoring a key advantage of having two ears that was previously thought to be inaccessible to CI users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.