Speech is crucial for communication in everyday life. Speech-brain entrainment, the alignment of neural activity to the slow temporal fluctuations (envelope) of acoustic speech input, is a ubiquitous element of current theories of speech processing. Associations between speech-brain entrainment and acoustic speech signal, listening task, and speech intelligibility have been observed repeatedly. However, a methodological bottleneck has prevented so far clarifying whether speech-brain entrainment contributes functionally to (i.e., causes) speech intelligibility or is merely an epiphenomenon of it. To address this long-standing issue, we experimentally manipulated speech-brain entrainment without concomitant acoustic and task-related variations, using a brain stimulation approach that enables modulating listeners' neural activity with transcranial currents carrying speech-envelope information. Results from two experiments involving a cocktail-party-like scenario and a listening situation devoid of aural speech-amplitude envelope input reveal consistent effects on listeners' speech-recognition performance, demonstrating a causal role of speech-brain entrainment in speech intelligibility. Our findings imply that speech-brain entrainment is critical for auditory speech comprehension and suggest that transcranial stimulation with speech-envelope-shaped currents can be utilized to modulate speech comprehension in impaired listening conditions.
In normal hearing (NH), the perception of the gender of a speaker is strongly affected by two anatomically related vocal characteristics: the fundamental frequency (F0), related to vocal pitch, and the vocal tract length (VTL), related to the height of the speaker. Previous studies on gender categorization in cochlear implant (CI) users found that performance was variable, with few CI users performing at the level of NH listeners. Data collected with recorded speech produced by multiple talkers suggests that CI users might rely more on F0 and less on VTL than NH listeners. However, because VTL cannot be accurately estimated from recordings, it is difficult to know how VTL contributes to gender categorization. In the present study, speech was synthesized to systematically vary F0, VTL, or both. Gender categorization was measured in CI users, as well as in NH participants listening to unprocessed (only synthesized) and vocoded (and synthesized) speech. Perceptual weights for F0 and VTL were derived from the performance data. With unprocessed speech, NH listeners used both cues (normalized perceptual weight: F0=3.76, VTL=5.56). With vocoded speech, NH listeners still made use of both cues but less efficiently (normalized perceptual weight: F0=1.68, VTL=0.63). CI users relied almost exclusively on F0 while VTL perception was profoundly impaired (normalized perceptual weight: F0=6.88, VTL=0.59). As a result, CI users' gender categorization was abnormal compared to NH listeners. Future CI signal processing should aim to improve the transmission of both F0 cues and VTL cues, as a normal gender categorization may benefit speech understanding in competing talker situations.
Objectives:When listening to two competing speakers, normal-hearing (NH) listeners can take advantage of voice differences between the speakers. Users of cochlear implants (CIs) have difficulty in perceiving speech on speech. Previous literature has indicated sensitivity to voice pitch (related to fundamental frequency, F0) to be poor among implant users, while sensitivity to vocal-tract length (VTL; related to the height of the speaker and formant frequencies), the other principal voice characteristic, has not been directly investigated in CIs. A few recent studies evaluated F0 and VTL perception indirectly, through voice gender categorization, which relies on perception of both voice cues. These studies revealed that, contrary to prior literature, CI users seem to rely exclusively on F0 while not utilizing VTL to perform this task. The objective of the present study was to directly and systematically assess raw sensitivity to F0 and VTL differences in CI users to define the extent of the deficit in voice perception.Design:The just-noticeable differences (JNDs) for F0 and VTL were measured in 11 CI listeners using triplets of consonant–vowel syllables in an adaptive three-alternative forced choice method.Results:The results showed that while NH listeners had average JNDs of 1.95 and 1.73 semitones (st) for F0 and VTL, respectively, CI listeners showed JNDs of 9.19 and 7.19 st. These JNDs correspond to differences of 70% in F0 and 52% in VTL. For comparison to the natural range of voices in the population, the F0 JND in CIs remains smaller than the typical male–female F0 difference. However, the average VTL JND in CIs is about twice as large as the typical male–female VTL difference.Conclusions:These findings, thus, directly confirm that CI listeners do not seem to have sufficient access to VTL cues, likely as a result of limited spectral resolution, and, hence, that CI listeners’ voice perception deficit goes beyond poor perception of F0. These results provide a potential common explanation not only for a number of deficits observed in CI listeners, such as voice identification and gender categorization, but also for competing speech perception.
Perception of voice characteristics allows normal hearing listeners to identify the gender of a speaker, and to better segregate speakers from each other in cocktail party situations. This benefit is largely driven by the perception of two vocal characteristics of the speaker: The fundamental frequency (F0) and the vocal-tract length (VTL). Previous studies have suggested that cochlear implant (CI) users have difficulties in perceiving these cues. The aim of the present study was to investigate possible causes for limited sensitivity to VTL differences in CI users. Different acoustic simulations of CI stimulation were implemented to characterize the role of spectral resolution on VTL, both in terms of number of channels and amount of channel interaction. The results indicate that with 12 channels, channel interaction caused by current spread is likely to prevent CI users from perceiving VTL differences typically found between male and female speakers.
In noisy listening conditions, intelligibility of degraded speech can be enhanced by top-down restoration. Cochlear implant (CI) users have difficulty understanding speech in noisy environments. This could partially be due to reduced top-down restoration of speech, which may be related to the changes that the electrical stimulation imposes on the bottom-up cues. We tested this hypothesis using the phonemic restoration (PhR) paradigm in which speech interrupted with periodic silent intervals is perceived illusorily continuous (continuity illusion or CoI) and becomes more intelligible (PhR benefit) when the interruptions are filled with noise bursts. Using meaningful sentences, both CoI and PhR benefit were measured in CI users, and compared with those of normal-hearing (NH) listeners presented with normal speech and 8-channel noise-band vocoded speech, acoustically simulating CIs. CI users showed different patterns in both PhR benefit and CoI, compared to NH results with or without the noise-band vocoding. However, they were able to use top-down restoration under certain test conditions. This observation supports the idea that changes in bottom-up cues can impose changes to the top-down processes needed to enhance intelligibility of degraded speech. The knowledge that CI users seem to be able to do restoration under the right circumstances could be exploited in patient rehabilitation and product development.
Evidence for transfer of musical training to better perception of speech in noise has been mixed. Unlike speech-in-noise, speech-on-speech perception utilizes many of the skills that musical training improves, such as better pitch perception and stream segregation, as well as use of higher-level auditory cognitive functions, such as attention. Indeed, despite the few non-musicians who performed as well as musicians, on a group level, there was a strong musician benefit for speech perception in a speech masker. This benefit does not seem to result from better voice processing and could instead be related to better stream segregation or enhanced cognitive functions.
Although segregation of both simultaneous and sequential speech items may be involved in the reception of speech in noisy environments, research on the latter is relatively sparse. Further, previous studies examining the ability of hearing-impaired listeners to form distinct auditory streams have produced mixed results. Finally, there is little work investigating streaming in cochlear implant recipients, who also have poor frequency resolution. The present study focused on the mechanisms involved in the segregation of vowel sequences and potential limitations to segregation associated with poor frequency resolution. An objective temporal-order paradigm was employed in which listeners reported the order of constituent vowels within a sequence. In Experiment 1, it was found that fundamental frequency based mechanisms contribute to segregation. In Experiment 2, reduced frequency tuning often associated with hearing impairment was simulated in normal-hearing listeners. In that experiment, it was found that spectral smearing of the vowels increased accurate identification of their order, presumably by reducing the tendency to form separate auditory streams. These experiments suggest that a reduction in spectral resolution may result in a reduced ability to form separate auditory streams, which may contribute to the difficulties of hearing-impaired listeners, and probably cochlear implant recipients as well, in multi-talker cocktail-party situations.
External degradations in incoming speech reduce understanding, and hearing impairment further compounds the problem. While cognitive mechanisms alleviate some of the difficulties, their effectiveness may change with age. In our research, reviewed here, we investigated cognitive compensation with hearing impairment, cochlear implants, and aging, via (a) phonemic restoration as a measure of top-down filling of missing speech, (b) listening effort and response times as a measure of increased cognitive processing, and (c) visual world paradigm and eye gazing as a measure of the use of context and its time course. Our results indicate that between speech degradations and their cognitive compensation, there is a fine balance that seems to vary greatly across individuals. Hearing impairment or inadequate hearing device settings may limit compensation benefits. Cochlear implants seem to allow the effective use of sentential context, but likely at the cost of delayed processing. Linguistic and lexical knowledge, which play an important role in compensation, may be successfully employed in advanced age, as some compensatory mechanisms seem to be preserved. These findings indicate that cognitive compensation in hearing impairment can be highly complicated—not always absent, but also not easily predicted by speech intelligibility tests only.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.