How humans solve the cocktail party problem remains unknown. However, progress has been made recently thanks to the realization that cortical activity tracks the amplitude envelope of speech. This has led to the development of regression methods for studying the neurophysiology of continuous speech. One such method, known as stimulus-reconstruction, has been successfully utilized with cortical surface recordings and magnetoencephalography (MEG). However, the former is invasive and gives a relatively restricted view of processing along the auditory hierarchy, whereas the latter is expensive and rare. Thus it would be extremely useful for research in many populations if stimulus-reconstruction was effective using electroencephalography (EEG), a widely available and inexpensive technology. Here we show that single-trial (≈60 s) unaveraged EEG data can be decoded to determine attentional selection in a naturalistic multispeaker environment. Furthermore, we show a significant correlation between our EEG-based measure of attention and performance on a high-level attention task. In addition, by attempting to decode attention at individual latencies, we identify neural processing at ∼200 ms as being critical for solving the cocktail party problem. These findings open up new avenues for studying the ongoing dynamics of cognition using EEG and for developing effective and natural brain-computer interfaces.
Selective auditory attention is essential for human listeners to be able to communicate in multi-source environments. Selective attention is known to modulate the neural representation of the auditory scene, boosting the representation of a target sound relative to the background, but the strength of this modulation, and the mechanisms contributing to it, are not well understood. Here, listeners performed a behavioral experiment demanding sustained, focused spatial auditory attention while we measured cortical responses using electroencephalography (EEG). We presented three concurrent melodic streams; listeners were asked to attend and analyze the melodic contour of one of the streams, randomly selected from trial to trial. In a control task, listeners heard the same sound mixtures, but performed the contour judgment task on a series of visual arrows, ignoring all auditory streams. We found that the cortical responses could be fit as weighted sum of event-related potentials evoked by the stimulus onsets in the competing streams. The weighting to a given stream was roughly 10 dB higher when it was attended compared to when another auditory stream was attended; during the visual task, the auditory gains were intermediate. We then used a template-matching classification scheme to classify single-trial EEG results. We found that in all subjects, we could determine which stream the subject was attending significantly better than by chance. By directly quantifying the effect of selective attention on auditory cortical responses, these results reveal that focused auditory attention both suppresses the response to an unattended stream and enhances the response to an attended stream. The single-trial classification results add to the growing body of literature suggesting that auditory attentional modulation is sufficiently robust that it could be used as a control mechanism in brain–computer interfaces (BCIs).
In order to extract information in a rich environment, we focus on different features that allow us to direct attention to whatever source is of interest. The cortical network deployed during spatial attention, especially in vision, is well characterized. For example, visuospatial attention engages a frontoparietal network including the frontal eye fields (FEFs), which modulate activity in visual sensory areas to enhance the representation of an attended visual object. However, relatively little is known about the neural circuitry controlling attention directed to non-spatial features, or to auditory objects or features (either spatial or non-spatial). Here, using combined magnetoencephalography (MEG) and anatomical information obtained from MRI, we contrasted cortical activity when observers attended to different auditory features given the same acoustic mixture of two simultaneous spoken digits. Leveraging the fine temporal resolution of MEG, we establish that activity in left FEF is enhanced both prior to and throughout the auditory stimulus when listeners direct auditory attention to target location compared to when they focus on target pitch. In contrast, activity in the left posterior superior temporal sulcus (STS), a region previously associated with auditory pitch categorization, is greater when listeners direct attention to target pitch rather than target location. This differential enhancement is only significant after observers are instructed which cue to attend, but before the acoustic stimuli begin. We therefore argue that left FEF participates more strongly in directing auditory spatial attention, while the left STS aids auditory object selection based on the non-spatial acoustic feature of pitch.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.