Acoustic sequences such as speech and music are generally perceived as coherent auditory "streams," which can be individually attended to and followed over time. Although the psychophysical stimulus parameters governing this "auditory streaming" are well established, the brain mechanisms underlying the formation of auditory streams remain largely unknown. In particular, an essential feature of the phenomenon, which corresponds to the fact that the segregation of sounds into streams typically takes several seconds to build up, remains unexplained. Here, we show that this and other major features of auditory-stream formation measured in humans using alternating-tone sequences can be quantitatively accounted for based on single-unit responses recorded in the primary auditory cortex (A1) of awake rhesus monkeys listening to the same sound sequences.
Humans and other animals can attend to one of multiple sounds, and follow it selectively over time. The neural underpinnings of this perceptual feat remain mysterious. Some studies have concluded that sounds are heard as separate streams when they activate well-separated populations of central auditory neurons, and that this process is largely pre-attentive. Here, we argue instead that stream formation depends primarily on temporal coherence between responses that encode various features of a sound source. Furthermore, we postulate that only when attention is directed towards a particular feature (e.g., pitch) do all other temporally coherent features of that source (e.g., timbre and location) become bound together as a stream that is segregated from the incoherent features of other sources. The auditory "scene analysis" problemHumans and other animals routinely detect, identify, and track sounds coming from a particular source (e.g., someone's voice, a conspecific call) amid sounds emanating from other sources (e.g., other voices, heterospecific calls, ambient music, or street traffic) ( Figure 1). The apparent ease with which they determine which components and attributes in a sound mixture arise from the same source belies the complexity of the underlying biological processes. By analogy with the "scene segmentation" problem in vision, this is referred to as the "auditory scene analysis" problem [1](Glossary) or, more colloquially, the "cocktail party" problem [2][3][4]. Understanding how the brain solves this problem is a fundamental challenge facing auditory scientists as it will shed light on the difficulties afflicting the hearing-impaired in multi-source environments [9], and give rise to more effective front-ends for auditory prostheses and automatic speech recognition [10].Recent studies have inspired numerous hypotheses and models concerning the neural underpinnings of perceptual organization in the central auditory system, and especially the auditory cortex (see [3,[7][8][11][12][13][14][15][16][17][18][19][20] for reviews). One prominent hypothesis that underlies most investigations is that sound elements segregate into separate "streams" whenever they activate well separated populations of auditory neurons that are selective to frequency or any other sound attributes that have been shown to support stream segregation (21-30). We shall © 2010 Elsevier Ltd. All rights reserved.Corresponding author: Shihab Shamma, Electrical and Computer Engineering and Institute for Systems Research, University of Maryland, College Park, MD 20742, Tel: 301-405-6842, sas@umd.edu. Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errorsmaybe discovered which could affect the content, and ...
Just as the visual system parses complex scenes into identifiable objects, the auditory system must organize sound elements scattered in frequency and time into coherent “streams”. Current neuro-computational theories of auditory streaming rely on tonotopic organization of the auditory system to explain the observation that sequential spectrally distant sound elements tend to form separate perceptual streams. Here, we show that spectral components that are well separated in frequency are no longer heard as separate streams if presented synchronously rather than consecutively. In contrast, responses from neurons in primary auditory cortex of ferrets show that both synchronous and asynchronous tone sequences produce comparably segregated responses along the tonotopic axis. The results argue against tonotopic separation per se as a neural correlate of stream segregation. Instead we propose a computational model of stream segregation that can account for the data by using temporal coherence as the primary criterion for predicting stream formation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.