Several studies have demonstrated that extended high frequencies (EHFs; >8 kHz) in speech are not only audible but also have some utility for speech recognition, including for speech-in-speech recognition when maskers are facing away from the listener. However, the contribution of EHF spectral versus temporal information to speech recognition is unknown. Here, we show that access to EHF temporal information improved speech-in-speech recognition relative to speech bandlimited at 8 kHz but that additional access to EHF spectral detail provided an additional small but significant benefit. Results suggest that both EHF spectral structure and the temporal envelope contribute to the observed EHF benefit. Speech recognition performance was quite sensitive to masker head orientation, with a rotation of only 15° providing a highly significant benefit. An exploratory analysis indicated that pure-tone thresholds at EHFs are better predictors of speech recognition performance than low-frequency pure-tone thresholds.
Recent work has demonstrated that high-frequency (>6 kHz) and extended high-frequency (EHF; >8 kHz) hearing is valuable for speech-in-noise recognition. Several studies also indicate that EHF pure-tone thresholds predict speech-in-noise performance. These findings contradict the broadly accepted “speech bandwidth” that has historically been limited to below 8 kHz. This growing body of work is a tribute to the work of Pat Stelmachowicz, whose research was instrumental in revealing the limitations of the prior speech bandwidth work, particularly for female talkers and child listeners. Here, we provide a historical review that demonstrates how the work of Stelmachowicz and her colleagues paved the way for subsequent research to measure effects of extended bandwidths and EHF hearing. We also present a reanalysis of previous data collected in our lab, the results of which suggest that 16-kHz pure-tone thresholds are consistent predictors of speech-in-noise performance, regardless of whether EHF cues are present in the speech signal. Based on the work of Stelmachowicz, her colleagues, and those who have come afterward, we argue that it is time to retire the notion of a limited speech bandwidth for speech perception for both children and adults.
Human talkers are directional sound sources—a phenomenon that has consequences for speech perception in multi-talker environments. Directivity patterns for speech showing frequency- and angle-dependent radiation reveal that speech generally becomes more directional toward the front of the talker as frequency increases. Differences in physical attributes can lead to individual variability in directivity patterns across talkers. Here, we examine individual variability in speech directivity using frequency-dependentdirectivity indices and directivity maps. Speech directivity was examined in the horizontal plane using a corpus of simultaneous multi-channelfull-bandwidth (48-kHz sampling rate) recordings of the Bamford-Kowal-Bench (BKB) sentences recorded in an anechoic chamber. Thirty subjects (15 female) were recorded. The long-term average speech spectrum was utilized to calculate directivity indices in 1-ERB (equivalent rectangular bandwidth) bands. Gender differences in directivity indices were evaluated using a linear mixed-effects model. There was no main effect of gender. There was a main effect of ERB band with higher-frequency bands tending to have higher (i.e., more directional) directivity indices, however there was a nonmonotonic relationship between average directivity indices and frequency. Directivity maps demonstrated individual differences in speech radiation. [Work supported by NIH under Grant No. R01-DC019745.]
Recent work has demonstrated that extended high-frequency (EHF; >8 kHz) hearing is valuable for speech-in-noise recognition. These findings contradict the broadly accepted “speech bandwidth” that has historically been limited to below 8 kHz. Several studies also indicate that EHF pure-tone thresholds predict speech-in-noise performance. One question that has arisen is whether the association that has been observed between EHF pure-tone thresholds and speech-in-noise recognition is causal—that loss of audibility of EHF cues in speech degrades speech-in-noise recognition. Indeed, this effect has been demonstrated using low-pass filtering, but whether elevated EHF thresholds would produce a similar effect is not certain. Another possibility is that EHF thresholds are a marker for subclinical dysfunction at lower frequencies that degrades speech recognition. These two possibilities are not mutually exclusive (nor exhaustive) and each could contribute to the observed relationship. Here we present a reanalysis of previous data collected in our lab, the results of which suggest that 16-kHz pure-tone thresholds are consistent predictors of speech-in-speech recognition, regardless of whether EHF cues are present in the speech signal. These findings suggest elevated EHF thresholds may indicate subclinical auditory dysfunction impairing speech-in-speech recognition.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.