In complex scenes, the identity of an auditory object can build up across seconds. Given that attention operates on perceptual objects, this perceptual buildup may alter the efficacy of selective auditory attention over time. Here, we measured identification of a sequence of spoken target digits presented with distracter digits from other directions to investigate the dynamics of selective attention. Performance was better when the target location was fixed rather than changing between digits, even when listeners were cued as much as 1 s in advance about the position of each subsequent digit. Spatial continuity not only avoided well known costs associated with switching the focus of spatial attention, but also produced refinements in the spatial selectivity of attention across time. Continuity of target voice further enhanced this buildup of selective attention. Results suggest that when attention is sustained on one auditory object within a complex scene, attentional selectivity improves over time. Similar effects may come into play when attention is sustained on an object in a complex visual scene, especially in cases where visual object formation requires sustained attention.source segregation ͉ auditory scene analysis ͉ spatial hearing ͉ streaming ͉ auditory mixture I n everyday situations, we are confronted with multiple objects that compete for our attention. Both stimulus-driven and goal-related mechanisms mediate the between-object competition to determine what will be brought to the perceptual foreground (1, 2). In natural scenes, objects come and go and the object of interest can change from moment to moment, such as when the flow of conversation shifts from one talker to another at a party. Thus, our ability to analyze objects in everyday settings is directly affected by how switching attention between objects affects perception. Much of what we know about the effects of switching attention comes from visual experiments in which observers monitor rapid sequences of images or search for an item in a static field of objects (3, 4). Although these situations give insight into the time it takes to dis-and reengage attention from one object to the next, they do not directly explore whether there are dynamic effects of sustaining attention on one object through time.In contrast to visual objects, the identity of an auditory object is intimately linked to how the content of a sound evolves over time. Moreover, the process of forming an auditory object is known to evolve over seconds (5)(6)(7)(8). Given that attention is object-based (9, 10), this refinement in object formation may directly impact the selectivity of attention in a complex auditory scene. Specifically, sustaining attention on one object in a complex scene may yield more refined selectivity to the attended object over time. In turn, switching attention to a new object may reset object formation and therefore reset attentional selectivity. If so, the cost of switching attention between objects may not only be related to the time required to dis-...
In auditory scenes containing many similar sound sources, sorting of acoustic information into streams becomes difficult, which can lead to disruptions in the identification of behaviorally relevant targets. This study investigated the benefit of providing simple visual cues for when and/or where a target would occur in a complex acoustic mixture. Importantly, the visual cues provided no information about the target content. In separate experiments, human subjects either identified learned birdsongs in the presence of a chorus of unlearned songs or recalled strings of spoken digits in the presence of speech maskers. A visual cue indicating which loudspeaker (from an array of five) would contain the target improved accuracy for both kinds of stimuli. A cue indicating which time segment (out of a possible five) would contain the target also improved accuracy, but much more for birdsong than for speech. These results suggest that in real world situations, information about where a target of interest is located can enhance its identification, while information about when to listen can also be helpful when targets are unfamiliar or extremely similar to their competitors.
Humans and animals must often discriminate between complex natural sounds in the presence of competing sounds (maskers). Although the auditory cortex is thought to be important in this task, the impact of maskers on cortical discrimination remains poorly understood. We examined neural responses in zebra finch (Taeniopygia guttata) field L (homologous to primary auditory cortex) to target birdsongs that were embedded in three different maskers (broadband noise, modulated noise and birdsong chorus). We found two distinct forms of interference in the neural responses: the addition of spurious spikes occurring primarily during the silent gaps between song syllables and the suppression of informative spikes occurring primarily during the syllables. Both effects systematically degraded neural discrimination as the target intensity decreased relative to that of the masker. The behavioral performance of songbirds degraded in a parallel manner. Our results identify neural interference that could explain the perceptual interference at the heart of the cocktail party problem.
Spatial unmasking describes the improvement in the detection or identification of a target sound afforded by separating it spatially from simultaneous masking sounds. This effect has been studied extensively for speech intelligibility in the presence of interfering sounds. In the current study, listeners identified zebra finch song, which shares many acoustic properties with speech but lacks semantic and linguistic content. Three maskers with the same long-term spectral content but different short-term statistics were used: (1) chorus (combinations of unfamiliar zebra finch songs), (2) song-shaped noise (broadband noise with the average spectrum of chorus), and (3) chorus-modulated noise (song-shaped noise multiplied by the broadband envelope from a chorus masker). The amount of masking and spatial unmasking depended on the masker and there was evidence of release from both energetic and informational masking. Spatial unmasking was greatest for the statistically similar chorus masker. For the two noise maskers, there was less spatial unmasking and it was wholly accounted for by the relative target and masker levels at the acoustically better ear. The results share many features with analogous results using speech targets, suggesting that spatial separation aids in the segregation of complex natural sounds through mechanisms that are not specific to speech.
Previous electrophysiological studies of interaural time difference (ITD) processing have demonstrated that ITDs are represented by a nontopographic population rate code. Rather than narrow tuning to ITDs, neural channels have broad tuning to ITDs in either the left or right auditory hemifield, and the relative activity between the channels determines the perceived lateralization of the sound. With advancing age, spatial perception weakens and poor temporal processing contributes to declining spatial acuity. At present, it is unclear whether age-related temporal processing deficits are due to poor inhibitory controls in the auditory system or degraded neural synchrony at the periphery. Cortical processing of spatial cues based on a hemifield code are susceptible to potential age-related physiological changes. We consider two distinct predictions of age-related changes to ITD sensitivity: declines in inhibitory mechanisms would lead to increased excitation and medial shifts to rate-azimuth functions, whereas a general reduction in neural synchrony would lead to reduced excitation and shallower slopes in the rate-azimuth function. The current study tested these possibilities by measuring an evoked response to ITD shifts in a narrow-band noise. Results were more in line with the latter outcome, both from measured latencies and amplitudes of the global field potentials and source-localized waveforms in the left and right auditory cortices. The measured responses for older listeners also tended to have reduced asymmetric distribution of activity in response to ITD shifts, which is consistent with other sensory and cognitive processing models of aging.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.