Highlights: Objective EEG-based measure of speech intelligibility Improved prediction of speech intelligibility by combining speech representations Cortical tracking of speech in the delta EEG band monotonically increased with SNRs Cortical responses in the theta EEG band best predicted the speech reception threshold Disclosure: The authors report no disclosures relevant to the manuscript.
Search Terms: cortical speech tracking, objective measure, speech intelligibility, auditory processing, speech representations. Highlights: Objective EEG-based measure of speech intelligibility Improved prediction of speech intelligibility by combining speech representations Cortical tracking of speech in the delta EEG band monotonically increased with SNRs Cortical responses in the theta EEG band best predicted the speech reception threshold ABSTRACT Objective -To objectively measure speech intelligibility of individual subjects from the EEG, based on cortical tracking of different representations of speech: low-level acoustical, higher-level discrete, or a combination. To compare each model's prediction of the speech reception threshold (SRT) for each individual with the behaviorally measured SRT.Methods -Nineteen participants listened to Flemish Matrix sentences presented at different signal-to-noise ratios (SNRs), corresponding to different levels of speech understanding. For different EEG frequency bands (delta, theta, alpha, beta or low-gamma), a model was built to predict the EEG signal from various speech representations: envelope, spectrogram, phonemes, phonetic features or a combination of phonetic Features and Spectrogram (FS). The same model was used for all subjects. The model predictions were then compared to the actual EEG of each subject for the different SNRs, and the prediction accuracy in function of SNR was used to predict the SRT.Results -The model based on the FS speech representation and the theta EEG band yielded the best SRT predictions, with a difference between the behavioral and objective SRT below 1 decibel for 53% and below 2 decibels for 89% of the subjects.Conclusion -A model including low-and higher-level speech features allows to predict the speech reception threshold from the EEG of people listening to natural speech. It has potential applications in diagnostics of the auditory system.
Objectives In recent years, there has been significant interest in recovering the temporal envelope of a speech signal from the neural response to investigate neural speech processing. The research focus is now broadening from neural speech processing in normal-hearing listeners towards hearing-impaired listeners. When testing hearing-impaired listeners, speech has to be amplified to resemble the effect of a hearing aid and compensate for peripheral hearing loss. Today it is not known with certainty how or if neural speech tracking is influenced by sound amplification. As these higher intensities could influence the outcome, we investigated the influence of stimulus intensity on neural speech tracking.Design We recorded the electroencephalogram (EEG) of 20 normal-hearing participants while they listened to a narrated story. The story was presented at intensities from 10 to 80 dB A. To investigate the brain responses, we analyzed neural tracking of the speech envelope by reconstructing the envelope from the EEG using a linear decoder and by correlating the reconstructed with the actual envelope. We investigated the delta (0.5-4 Hz) and the theta (4-8 Hz) band for each intensity. We also investigated the latencies and amplitudes of the responses in more detail using temporal response functions, which are the estimated linear response functions between the stimulus envelope and the EEG.Results Neural envelope tracking is dependent on stimulus intensity in both the TRF and envelope reconstruction analysis. However, provided that the decoder is applied to the same stimulus intensity as it was trained on, envelope reconstruction is robust to stimulus intensity. Besides, neural envelope tracking in the delta (but not theta) band seems to relate to speech intelligibility. Similar to the linear decoder analysis, TRF amplitudes and latencies are dependent on stimulus intensity: The amplitude of peak 1 (30-50 ms) increases, and the latency of peak 2 (140-160 ms) decreases with increasing stimulus intensity.Conclusion Although brain responses are influenced by stimulus intensity, neural envelope tracking is robust to stimulus intensity when using the same intensity to test and train the decoder. Therefore we can assume that intensity will not be a confounder when testing hearing-impaired participants with amplified speech using the linear decoder approach. In addition, neural envelope tracking in the delta band appears to be correlated with speech intelligibility, showing the potential of neural envelope tracking as an objective measure of speech intelligibility.
Objectives Recently an objective measure of speech intelligibility, based on brain responses derived from the electroencephalogram (EEG), has been developed using isolated Matrix sentences as a stimulus. We investigated whether this objective measure of speech intelligibility can also be used with natural speech as a stimulus, as this would be beneficial for clinical applications.Design We recorded the EEG in 19 normal-hearing participants while they listened to two types of stimuli: Matrix sentences and a natural story. Each stimulus was presented at different levels of speech intelligibility by adding speech weighted noise. Speech intelligibility was assessed in two ways for both stimuli: (1) behaviorally and (2) objectively by reconstructing the speech envelope from the EEG using a linear decoder and correlating it with the acoustic envelope. We also calculated temporal response functions (TRFs) to investigate the temporal characteristics of the brain responses in the EEG channels covering different brain areas. ResultsFor both stimulus types the correlation between the speech envelope and the reconstructed envelope increased with increasing speech intelligibility. In addition, correlations were higher for the natural story than for the Matrix sentences. Similar to the linear decoder analysis, TRF amplitudes increased with increasing speech intelligibility for both stimuli. Remarkable is that although speech intelligibility remained unchanged, neural speech processing was affected by the addition of a small amount of noise: TRF amplitudes across the entire scalp decreased between 0 to 150 ms, while amplitudes between 150 to 200 ms increased. TRF latency changes in function of speech intelligibility appeared to be stimulus specific: The latency of the prominent negative peak in the early responses (50-300 ms) increased with increasing speech intelligibility for the Matrix sentences, but remained unchanged for the natural story. 2
In clinical practice and research, speech intelligibility is generally measured by instructing the participant to recall sentences. Although this is a reliable and highly repeatable measure, it cannot be used to measure intelligibility of connected discourse. Therefore, we developed a new method, the self-assessed Békesy procedure, which is an adaptive procedure that uses intelligibility ratings to converge to a person’s speech reception threshold. In this study, we describe the new procedure and the validation in young, normal-hearing listeners. First, we compared the results on the self-assessed Békesy procedure to a recall procedure for standardized sentences. Next, we evaluated the inter- and intrasubject variability of our procedure. Furthermore, we compared the thresholds for sentences in three masker types between the self-assessed Békesy and a recall procedure to verify if these procedures resulted in similar conclusions. Finally, we compared the thresholds for two types of sentences and commercial recordings of stories. In general, the self-assessed Békesy procedure is shown to be a valid and reliable procedure as similar thresholds (difference < 1 dB) and test–retest reliability (< 1.5 dB) were observed compared with standard speech audiometry tests. In addition, the time efficiency and similar differences between maskers to a recall procedure support the potential of this procedure to be implemented in research. Finally, significant differences between the thresholds of sentences and connected discourse materials were found, indicating the importance of controlling for differences in intelligibility when presenting these materials at the same signal-to-noise ratios or when comparing studies.
2The speech envelope is essential for speech understanding and can be reconstructed from the 3 electroencephalogram (EEG) recorded while listening to running speech. This so-called neural 4 envelope tracking has been shown to relate to speech understanding in normal hearing listeners, 5 but has barely been investigated in persons wearing cochlear implants (CI). We investigated the 6 relation between speech understanding and neural envelope tracking in CI users. 7 EEG was recorded in 8 CI users while they listened to a story. Speech understanding was varied 8 by changing the intensity of the presented speech. The speech envelope was reconstructed from 9 the EEG using a linear decoder and then correlated with the envelope of the speech stimulus as 10 a measure of neural envelope tracking which was compared to actual speech understanding. 11 This study showed that neural envelope tracking increased with increasing speech 12 understanding in every participant. Furthermore behaviorally measured speech understanding 13 was correlated with participant specific neural envelope tracking results indicating the potential 14 of neural envelope tracking as an objective measure of speech understanding in CI users. This 15 could enable objective and automatic fitting of CIs and pave the way towards closed-loop CIs that 16 adjust continuously and automatically to individual CI users. 17 18 2 Speech is characterized by fast and slow modulations. The slow modulations are also called the envelope 19 of speech, reflecting the different syllable, word and sentence boundaries known to be essential for 20 speech understanding (Shannon et al., 1995). Previous studies have shown that the brain tracks the speech 21 envelope and that it is possible to reconstruct the envelope from brain responses in normal hearing listeners 22 using electroencephalography (EEG) or magnetoencephalography (Aiken and Picton, 2008; Luo and 23 Poeppel, 2007; Ding and Simon, 2011; Ding et al., 2015; Meyer et al., 2017). The correlation between 24 this reconstructed envelope and the real speech envelope reflects a measure of neural envelope tracking. 25Recently, researchers were able to establish a link between increasing neural envelope tracking and 26 increasing speech understanding using speech versus non-speech stimuli (Molinaro and Lizarazu, 2017), 27 priming and vocoders (Di Liberto et al., 2018) or by adding background noise to the speech signal (Ding 28 and Simon, 2013; Ding et al., 2014; Vanthornhout et al., 2018), underlining the application potential of 29 neural envelope tracking as an objective measure of speech understanding. 30Besides the promising results in normal hearing listeners, neural envelope tracking has been measured in 31 listeners with a hearing impairment by Petersen et al. (2017). They showed that the amount of hearing loss 32 could be related to neural tracking of the to-be-ignored speech, diminishing the difference between the 33 attended and unattended speech stream in persons with increasing hearing loss...
Objective -Measurement of the cortical tracking of continuous speech from electroencephalography (EEG) recordings using a forward model is an important tool in auditory neuroscience. Usually the stimulus is represented by its temporal envelope. Recently, the phonetic representation of speech was successfully introduced in English. We aim to show that the EEG prediction from phoneme-related speech features is possible in Dutch. The method requires a manual channel selection based on visual inspection or prior knowledge to obtain a summary measure of cortical tracking. We evaluate a method to (1) remove nonstimulus-related activity from the EEG signals to be predicted, and (2) automatically select the channels of interest. Approach -Eighteen participants listened to a Flemish story, while their EEG was recorded. Subject-specific and grand-average temporal response functions were determined between the EEG activity in different frequency bands and several stimulus features: the envelope, spectrogram, phonemes, phonetic features or a combination. The temporal response functions were used to predict EEG from the stimulus, and the predicted was compared with the recorded EEG, yielding a measure of cortical tracking of stimulus features. A spatial filter was calculated based on the generalized eigenvalue decomposition (GEVD), and the effect on EEG prediction accuracy was determined. Main results -A model including both low-and high-level speech representations was able to better predict the brain responses to the speech than a model only including low-level features. The inclusion of a GEVD-based spatial filter in the model increased the prediction accuracy of cortical responses to each speech feature at both single-subject (270% improvement) and group-level (310%). Significance -We showed that the inclusion of acoustical and phonetic speech information and the addition of a data-driven spatial filter allow improved modelling of the relationship between the speech and its brain responses and offer an automatic channel selection.
When listening to continuous speech, the human brain can track features of the presented speech signal. It has been shown that neural tracking of acoustic features is a prerequisite for speech understanding and can predict speech understanding in controlled circumstances. However, the brain also tracks linguistic features of speech, which may be more directly related to speech understanding. We investigated acoustic and linguistic speech processing as a function of varying speech understanding by manipulating the speech rate. In this paradigm, acoustic and linguistic speech processing are affected simultaneously but in opposite directions: When the speech rate increases, more acoustic information per second is present. In contrast, the tracking of linguistic information becomes more challenging when speech is less intelligible at higher speech rates. We measured the EEG of 18 participants (4 male) who listened to speech at various speech rates. As expected and confirmed by the behavioral results, speech understanding decreased with increasing speech rate. Accordingly, linguistic neural tracking decreased with increasing speech rate, but acoustic neural tracking increased. This indicates that neural tracking of linguistic representations can capture the gradual effect of decreasing speech understanding. In addition, increased acoustic neural tracking does not necessarily imply better speech understanding. This suggests that, although more challenging to measure due to the low signal-to-noise ratio, linguistic neural tracking may be a more direct predictor of speech understanding.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.