Envelope Processing and Sound-Source Perception

Sheft, Stanley

doi:10.1007/978-0-387-71305-2_9

Cited by 7 publications

(5 citation statements)

References 224 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In speech and natural sounds in general, the temporal envelope synchronizes various acoustic features, including pitch and formant structures. Therefore, they provide important cues for perceptual auditory grouping (30) and are critical for robust speech recognition. For example, major speech segregation cues, such as pitch, are not sufficient for speech recognition, whereas acoustic features necessary for speech recognition (e.g., the spectrotemporal envelope) are not easily distinguishable between speakers.…”

Section: Discussionmentioning

confidence: 99%

Emergence of neural encoding of auditory objects while listening to competing speakers

Ding¹,

Simon

2012

Proc. Natl. Acad. Sci. U.S.A.

736

107

848

View full text Add to dashboard Cite

A visual scene is perceived in terms of visual objects. Similar ideas have been proposed for the analogous case of auditory scene analysis, although their hypothesized neural underpinnings have not yet been established. Here, we address this question by recording from subjects selectively listening to one of two competing speakers, either of different or the same sex, using magnetoencephalography. Individual neural representations are seen for the speech of the two speakers, with each being selectively phase locked to the rhythm of the corresponding speech stream and from which can be exclusively reconstructed the temporal envelope of that speech stream. The neural representation of the attended speech dominates responses (with latency near 100 ms) in posterior auditory cortex. Furthermore, when the intensity of the attended and background speakers is separately varied over an 8-dB range, the neural representation of the attended speech adapts only to the intensity of that speaker but not to the intensity of the background speaker, suggesting an object-level intensity gain control. In summary, these results indicate that concurrent auditory objects, even if spectrotemporally overlapping and not resolvable at the auditory periphery, are neurally encoded individually in auditory cortex and emerge as fundamental representational units for topdown attentional modulation and bottom-up neural adaptation.spectrotemporal response function | reverse correlation | phase locking | selective attention I n a complex auditory scene, humans and other animal species can perceptually detect and recognize individual auditory objects (i.e., the sound arising from a single source), even if strongly overlapping acoustically with sounds from other sources. To accomplish this remarkably difficult task, it has been hypothesized that the auditory system first decomposes the complex auditory scene into separate acoustic features and then binds the features, as appropriate, into auditory objects (1-4). The neural representations of auditory objects, each the collective representation of all the features belonging to the same auditory object, have been hypothesized to emerge in auditory cortex to become fundamental units for high-level cognitive processing (5-7). The process of parsing an auditory scene into auditory objects is computationally complex and cannot as yet be emulated by computer algorithms (8), but it occurs reliably, and often effortlessly, in the human auditory system. For example, in the classic "cocktail party problem," where multiple speakers are talking at the same time (9), human listeners can selectively attend to a chosen target speaker, even if the competing speakers are acoustically more salient (e.g., louder) or perceptually very similar (such as of the same sex) (10).To demonstrate an object-based neural representation that could subserve the robust perception of an auditory object, several key pieces of evidence are needed. The first is to demonstrate neural activity that exclusively represents a single auditory...

show abstract

Section: Discussionmentioning

confidence: 99%

Emergence of neural encoding of auditory objects while listening to competing speakers

Ding¹,

Simon

2012

Proc. Natl. Acad. Sci. U.S.A.

736

107

848

View full text Add to dashboard Cite

show abstract

“…In order to achieve speech recognition or auditory perception in general, however, features belonging to the same speech stream need to be bound or re-synthesized into an auditory object (Shinn-Cunningham, 2008). In speech, multiple acoustic features are temporally coupled and the spectro-temporal fine structure is modulated by the temporal envelope (Sheft, 2007; Shamma et al, 2011). Therefore, in this synthesis stage, sound segregation cues play a guiding role: The auditory system is proposed to group features based on their temporal coherence with the sound segregation cues (Shamma et al, 2011).…”

Section: Discussionmentioning

confidence: 99%

Robust cortical entrainment to the speech envelope relies on the spectro-temporal fine structure

2014

View full text Add to dashboard Cite

Speech recognition is robust to background noise. One underlying neural mechanism is that the auditory system segregates speech from the listening background and encodes it reliably. Such robust internal representation has been demonstrated in auditory cortex by neural activity entrained to the temporal envelope of speech. A paradox, however, then arises, as the spectro-temporal fine structure rather than the temporal envelope is known to be the major cue to segregate target speech from background noise. Does the reliable cortical entrainment in fact reflect a robust internal “synthesis” of the attended speech stream rather than direct tracking of the acoustic envelope? Here, we test this hypothesis by degrading the spectro-temporal fine structure while preserving the temporal envelope using vocoders. Magnetoencephalography (MEG) recordings reveal that cortical entrainment to vocoded speech is severely degraded by background noise, in contrast to the robust entrainment to natural speech. Furthermore, cortical entrainment in the delta-band (1–4 Hz) predicts the speech recognition score at the level of individual listeners. These results demonstrate that reliable cortical entrainment to speech relies on the spectro-temporal fine structure, and suggest that cortical entrainment to the speech envelope is not merely a representation of the speech envelope but a coherent representation of multiscale spectro-temporal features that are synchronized to the syllabic and phrasal rhythms of speech.

show abstract

“…This prediction is contradicted by psychophysical and neurophysiological data [44], which demonstrate that sequences of tones that are separated by an octave or more are still heard as a single stream if the tones are synchronous or, more precisely, fully coherent in time (Box 2 and Glossary). Numerous other psychoacoustical findings indicate that coherence strongly promotes perceptual grouping [45]. To account for these findings, it is necessary to consider the relative timing of the neural responses, or more specifically their temporal coherence.…”

Section: Temporal Coherence In Auditory Scene Analysismentioning

confidence: 99%

Temporal coherence and attention in auditory scene analysis

Shamma

Elhilali

Micheyl

2011

Trends in Neurosciences

383

360

View full text Add to dashboard Cite

Humans and other animals can attend to one of multiple sounds, and follow it selectively over time. The neural underpinnings of this perceptual feat remain mysterious. Some studies have concluded that sounds are heard as separate streams when they activate well-separated populations of central auditory neurons, and that this process is largely pre-attentive. Here, we argue instead that stream formation depends primarily on temporal coherence between responses that encode various features of a sound source. Furthermore, we postulate that only when attention is directed towards a particular feature (e.g., pitch) do all other temporally coherent features of that source (e.g., timbre and location) become bound together as a stream that is segregated from the incoherent features of other sources. The auditory "scene analysis" problemHumans and other animals routinely detect, identify, and track sounds coming from a particular source (e.g., someone's voice, a conspecific call) amid sounds emanating from other sources (e.g., other voices, heterospecific calls, ambient music, or street traffic) ( Figure 1). The apparent ease with which they determine which components and attributes in a sound mixture arise from the same source belies the complexity of the underlying biological processes. By analogy with the "scene segmentation" problem in vision, this is referred to as the "auditory scene analysis" problem [1](Glossary) or, more colloquially, the "cocktail party" problem [2][3][4]. Understanding how the brain solves this problem is a fundamental challenge facing auditory scientists as it will shed light on the difficulties afflicting the hearing-impaired in multi-source environments [9], and give rise to more effective front-ends for auditory prostheses and automatic speech recognition [10].Recent studies have inspired numerous hypotheses and models concerning the neural underpinnings of perceptual organization in the central auditory system, and especially the auditory cortex (see [3,[7][8][11][12][13][14][15][16][17][18][19][20] for reviews). One prominent hypothesis that underlies most investigations is that sound elements segregate into separate "streams" whenever they activate well separated populations of auditory neurons that are selective to frequency or any other sound attributes that have been shown to support stream segregation (21-30). We shall © 2010 Elsevier Ltd. All rights reserved.Corresponding author: Shihab Shamma, Electrical and Computer Engineering and Institute for Systems Research, University of Maryland, College Park, MD 20742, Tel: 301-405-6842, sas@umd.edu. Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errorsmaybe discovered which could affect the content, and ...

show abstract

Envelope Processing and Sound-Source Perception

Cited by 7 publications

References 224 publications

Emergence of neural encoding of auditory objects while listening to competing speakers

Emergence of neural encoding of auditory objects while listening to competing speakers

Robust cortical entrainment to the speech envelope relies on the spectro-temporal fine structure

Temporal coherence and attention in auditory scene analysis

Contact Info

Product

Resources

About