Identification of target speech was studied under masked conditions consisting of two or four independent speech maskers. In the reference conditions, the maskers were colocated with the target, the masker talkers were the same sex as the target, and the masker speech was intelligible. The comparison conditions, intended to provide release from masking, included different-sex target and masker talkers, time-reversal of the masker speech, and spatial separation of the maskers from the target. Significant release from masking was found for all comparison conditions. To determine whether these reductions in masking could be attributed to differences in energetic masking, ideal time-frequency segregation (ITFS) processing was applied so that the time-frequency units where the masker energy dominated the target energy were removed. The remaining target-dominated “glimpses” were reassembled as the stimulus. Speech reception thresholds measured using these resynthesized ITFS-processed stimuli were the same for the reference and comparison conditions supporting the conclusion that the amount of energetic masking across conditions was the same. These results indicated that the large release from masking found under all comparison conditions was due primarily to a reduction in informational masking. Furthermore, the large individual differences observed generally were correlated across the three masking release conditions.
Are musicians better able to understand speech in noise than non-musicians? Recent findings have produced contradictory results. Here we addressed this question by asking musicians and non-musicians to understand target sentences masked by other sentences presented from different spatial locations, the classical ‘cocktail party problem’ in speech science. We found that musicians obtained a substantial benefit in this situation, with thresholds ~6 dB better than non-musicians. Large individual differences in performance were noted particularly for the non-musically trained group. Furthermore, in different conditions we manipulated the spatial location and intelligibility of the masking sentences, thus changing the amount of ‘informational masking’ (IM) while keeping the amount of ‘energetic masking’ (EM) relatively constant. When the maskers were unintelligible and spatially separated from the target (low in IM), musicians and non-musicians performed comparably. These results suggest that the characteristics of speech maskers and the amount of IM can influence the magnitude of the differences found between musicians and non-musicians in multiple-talker “cocktail party” environments. Furthermore, considering the task in terms of the EM-IM distinction provides a conceptual framework for future behavioral and neuroscientific studies which explore the underlying sensory and cognitive mechanisms contributing to enhanced “speech-in-noise” perception by musicians.
Neural representation of pitch is influenced by lifelong experiences with music and language at both cortical and subcortical levels of processing. The aim of this article is to determine whether neural plasticity for pitch representation at the level of the brainstem is dependent upon specific dimensions of pitch contours that commonly occur as part of a native listener's language experience. Brainstem frequency following responses (FFRs) were recorded from Chinese and English participants in response to four Mandarin tonal contours presented in a nonspeech context in the form of iterated rippled noise. Pitch strength (whole contour, 250 msec; 40-msec segments) and pitch-tracking accuracy (whole contour) were extracted from the FFRs using autocorrelation algorithms. Narrow band spectrograms were used to extract spectral information. Results showed that the Chinese group exhibits smoother pitch tracking than the English group in three out of the four tones. Moreover, cross-language comparisons of pitch strength of 40-msec segments revealed that the Chinese group exhibits more robust pitch representation of those segments containing rapidly changing pitch movements across all four tones. FFR spectral data were complementary showing that the Chinese group exhibits stronger representation of multiple pitch-relevant harmonics relative to the English group across all four tones. These findings support the view that at early preattentive stages of subcortical processing, neural mechanisms underlying pitch representation are shaped by particular dimensions of the auditory stream rather than speech per se. Adopting a temporal correlation analysis scheme for pitch encoding, we propose that long-term experience sharpens the tuning characteristics of neurons along the pitch axis with enhanced sensitivity to linguistically relevant variations in pitch.
Any sound can be separated mathematically into a slowly varying envelope and rapidly varying finestructure component. This property has motivated numerous perceptual studies to understand the relative importance of each component for speech and music perception. Specialized acoustic stimuli, such as auditory chimaeras with the envelope of one sound and fine structure of another have been used to separate the perceptual roles for envelope and fine structure. Cochlear narrowband filtering limits the ability to isolate fine structure from envelope; however, envelope recovery from fine structure has been difficult to evaluate physiologically. To evaluate envelope recovery at the output of the cochlea, neural cross-correlation coefficients were developed that quantify the similarity between two sets of spike-train responses. Shuffled auto-and cross-correlogram analyses were used to compute separate correlations for responses to envelope and fine structure based on both model and recorded spike trains from auditory nerve fibers. Previous correlogram analyses were extended to isolate envelope coding more effectively in auditory nerve fibers with low center frequencies, which are particularly important for speech coding. Recovered speech envelopes were present in both model and recorded responses to one-and 16-band speech fine-structure chimaeras and were significantly greater for the one-band case, consistent with perceptual studies. Model predictions suggest that cochlear recovered envelopes are reduced following sensorineural hearing loss due to broadened tuning associated with outer-hair cell dysfunction. In addition to the within-fiber cross-stimulus cases considered here, these neural cross-correlation coefficients can also be used to evaluate spatiotemporal coding by applying them to cross-fiber within-stimulus conditions. Thus, these neural metrics can be used to quantitatively evaluate a wide range of perceptually significant temporal coding issues relevant to normal and impaired hearing.
Brainstem frequency-following responses were recorded from Chinese and English participants in response to an iterated rippled noise homologue of Mandarin Tone 2 (T2) plus linear and inverted curvilinear variants. Pitch tracking accuracy and pitch strength analyses showed advantages for the Chinese group over the English in response to T2 only. Pitch strength was larger for the Chinese group in rapidly-changing sections of T2 compared to corresponding sections of a linear ramp. We conclude that experience-dependent neural plasticity at subcortical levels of representation is highly sensitive to specific features of pitch patterns in one's native language. Such experience-dependent effects suggest that subcortical sensory encoding interacts with cognitive processing in the cerebral cortex to shape the perceptual system's response to pitch patterns.
Understanding speech in noisy environments is often taken for granted; however, this task is particularly challenging for people with cochlear hearing loss, even with hearing aids or cochlear implants. A significant limitation to improving auditory prostheses is our lack of understanding of the neural basis for robust speech perception in noise. Perceptual studies suggest the slowly varying component of the acoustic waveform (envelope, ENV) is sufficient for understanding speech in quiet, but the rapidly varying temporal fine structure (TFS) is important in noise. These perceptual findings have important implications for cochlear implants, which currently only provide ENV; however, neural correlates have been difficult to evaluate due to cochlear transformations between acoustic TFS and recovered neural ENV. Here, we demonstrate the relative contributions of neural ENV and TFS by quantitatively linking neural coding, predicted from a computational auditory-nerve model, with perception of vocoded speech in noise measured from normal-hearing human listeners. Regression models with ENV and TFS coding as independent variables predicted speech identification and phonetic-feature reception at both positive and negative signal-to-noise ratios. We found that 1) neural ENV coding was a primary contributor to speech perception, even in noise, and 2) neural TFS contributed in noise mainly in the presence of neural ENV, but rarely as the primary cue itself. These results suggest neural TFS has less perceptual salience than previously thought due to cochlear signal-processing transformations between TFS and ENV. Because these transformations differ between normal and impaired ears, these findings have important translational implications for auditory prostheses.
Frequency-following responses were recorded from Chinese and English participants at the level of the brainstem in response to four Mandarin tonal contours presented in a speech and non-speech context. Pitch strength analysis of these preattentive brainstem responses showed that the Chinese group exhibited stronger pitch representation than the English group regardless of context. Moreover, the Chinese group exhibited relatively more robust pitch representation of rapidly changing pitch segments. These findings support the view that at early preattentive stages of subcortical processing, neural mechanisms underlying pitch representation are shaped by particular features of the auditory stream rather than speech per se. These findings have implications for optimizing signal-processing strategies for cochlear implant design for speakers of tonal languages.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.