This paper considers the auditory attention detection (AAD) paradigm, where the goal is to determine which of two simultaneous speakers a person is attending to. The paradigm relies on recordings of the listener's brain activity, e.g., from electroencephalography (EEG). To perform AAD, decoded EEG signals are typically correlated with the temporal envelopes of the speech signals of the separate speakers. In this paper, we study how the inclusion of various degrees of auditory modelling in this speech envelope extraction process affects the AAD performance, where the best performance is found for an auditory-inspired linear filter bank followed by power law compression. These two modelling stages are computationally cheap, which is important for implementation in wearable devices, such as future neuro-steered auditory prostheses. We also introduce a more natural way to combine recordings (over trials and subjects) to train the decoder, which reduces the dependence of the algorithm on regularization parameters. Finally, we investigate the simultaneous design of the EEG decoder and the audio subband envelope recombination weights vector using either a norm-constrained least squares or a canonical correlation analysis, but conclude that this increases computational complexity without improving AAD performance.
When multiple people talk simultaneously, the healthy human auditory system is able to attend to one particular speaker of interest. Recently, it has been demonstrated that it is possible to infer to which speaker someone is attending by relating the neural activity, recorded by electroencephalography (EEG), with the speech signals. This is relevant for an effective noise suppression in hearing devices, in order to detect the target speaker in a multi-speaker scenario. Most auditory attention detection algorithms use a linear EEG decoder to reconstruct the attended stimulus envelope, which is then compared to the original stimuli envelopes to determine the attended speaker. Classifying attention within a short time interval remains the main challenge. We present two different convolutional neural network (CNN)-based approaches to solve this problem. One aims to select the attended speaker from a given set of individual speaker envelopes, and the other extracts the locus of auditory attention (left or right), without knowledge of the speech envelopes. Our results show that it is possible to decode attention within 1-2 seconds, with a median accuracy around 80%, without access to the speech envelopes. This is promising for neuro-steered noise suppression in hearing aids, which requires fast and accurate attention detection. Furthermore, the possibility of detecting the locus of auditory attention without access to the speech envelopes is promising for the scenarios in which per-speaker envelopes are unavailable. It will also enable establishing a fast and objective attention measure in future studies. Index TermsConvolutional neural networks (CNN), auditory attention detection (AAD), electroencephalography (EEG), neurosteered auditory prosthesis, brain-computer interface (BCI)
This work shows the importance of using realistic binaural listening conditions and training on a balanced set of experimental conditions to obtain results that are more representative for the true AAD performance in practical applications.
This work shows how the background noise level and relative positions of competing talkers impact attention decoding accuracy. It indicates in which circumstances a neuro-steered noise suppression system may need to operate, in function of acoustic conditions. It also indicates the boundary conditions for the operation of EEG-based attention detection systems in neuro-steered hearing prostheses.
Objective. A listener's neural responses can be decoded to identify the speaker the person is attending to in a cocktail party environment. Such auditory attention detection methods have the potential to provide noise suppression algorithms in hearing devices with information about the listener's attention. A challenge is the effect of noise and other acoustic conditions that can reduce the attention detection accuracy. Specifically, noise can impact the ability of the person to segregate the sound sources and perform selective attention, as well as the external signal processing necessary to decode the attention effectively. The aim of this work is to systematically analyze the effect of noise level and speaker position on attention decoding accuracy. Approach. 28 subjects participated in the experiment. Auditory stimuli consisted of stories narrated by different speakers from 2 different locations, along with surrounding multitalker background babble. EEG signals of the subjects were recorded while they focused on one story and ignored the other. The strength of the babble noise as well as the spatial separation between the two speakers were varied between presentations. Spatio-temporal decoders were trained for each subject, and applied to decode attention of the subjects from every 30s segment of data. Behavioral speech recognition thresholds were obtained for the different speaker separations. Main results. Both the background noise level and the angular separation between speakers affected attention decoding accuracy. Remarkably, attention decoding performance was seen to increase with the inclusion of moderate background noise (versus no noise), while across the different noise conditions performance dropped significantly with increasing noise level. We also observed that decoding accuracy improved with increasing speaker separation, exhibiting the advantage of spatial release from masking. Furthermore, the effect of speaker separation on the decoding accuracy became stronger when the background noise level increased. A significant correlation between speech intelligibility and attention decoding accuracy was found across conditions. Significance. This work shows how the background noise level and relative positions of competing talkers impact attention decoding accuracy. It indicates in which circumstances a neuro-steered noise suppression system may need to operate, in function of acoustic conditions. It also indicates the boundary conditions for the operation of EEG-based attention detection systems in neuro-steered hearing prostheses.
Abstract-Hearing prostheses have built-in algorithms to perform acoustic noise reduction and improve speech intelligibility. However, in a multi-speaker scenario the noise reduction algorithm has to determine which speaker the listener is focusing on, in order to enhance it while suppressing the other interfering sources. Recently, it has been demonstrated that it is possible to detect auditory attention using electroencephalography (EEG). In this paper, we use multi-channel Wiener filters (MWFs), to filter out each speech stream from the speech mixtures in the microphones of a binaural hearing aid, while also reducing background noise. From the demixed and denoised speech streams, we extract envelopes for an EEG-based auditory attention detection (AAD) algorithm. The AAD module can then select the output of the MWF corresponding to the attended speaker. We evaluate our algorithm in a two-speaker scenario in the presence of babble noise and compare it to a previously proposed algorithm. Our algorithm is observed to provide speech envelopes that yield better AAD accuracies, and is more robust to variations in speaker positions and diffuse background noise.
In a multi-speaker scenario, the human auditory system is able to attend to one particular speaker of interest and ignore the others. It has been demonstrated that it is possible to use electroencephalography (EEG) signals to infer to which speaker someone is attending by relating the neural activity to the speech signals. However, classifying auditory attention within a short time interval remains the main challenge. We present a convolutional neural network-based approach to extract the locus of auditory attention (left/right) without knowledge of the speech envelopes. Our results show that it is possible to decode the locus of attention within 1–2 s, with a median accuracy of around 81%. These results are promising for neuro-steered noise suppression in hearing aids, in particular in scenarios where per-speaker envelopes are unavailable.
In clinical practice and research, speech intelligibility is generally measured by instructing the participant to recall sentences. Although this is a reliable and highly repeatable measure, it cannot be used to measure intelligibility of connected discourse. Therefore, we developed a new method, the self-assessed Békesy procedure, which is an adaptive procedure that uses intelligibility ratings to converge to a person’s speech reception threshold. In this study, we describe the new procedure and the validation in young, normal-hearing listeners. First, we compared the results on the self-assessed Békesy procedure to a recall procedure for standardized sentences. Next, we evaluated the inter- and intrasubject variability of our procedure. Furthermore, we compared the thresholds for sentences in three masker types between the self-assessed Békesy and a recall procedure to verify if these procedures resulted in similar conclusions. Finally, we compared the thresholds for two types of sentences and commercial recordings of stories. In general, the self-assessed Békesy procedure is shown to be a valid and reliable procedure as similar thresholds (difference < 1 dB) and test–retest reliability (< 1.5 dB) were observed compared with standard speech audiometry tests. In addition, the time efficiency and similar differences between maskers to a recall procedure support the potential of this procedure to be implemented in research. Finally, significant differences between the thresholds of sentences and connected discourse materials were found, indicating the importance of controlling for differences in intelligibility when presenting these materials at the same signal-to-noise ratios or when comparing studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.