Abstract:The perceptual organization of two-tone sequences into auditory streams was investigated using a modeling framework consisting of an auditory pre-processing front end [Dau et al., J. Acoust. Soc. Am. 102, 2892-2905 (1997)] combined with a temporal coherence-analysis back end [Elhilali et al., Neuron 61, 317-329 (2009)]. Two experimental paradigms were considered: (i) Stream segregation as a function of tone repetition time (TRT) and frequency separation (Δf) and (ii) grouping of distant spectral components bas… Show more
“…Previous studies have shown that onset/offset asynchronies larger than 20-40 ms lead to increased stream segregation (e.g., Turgeon et al, 2002;Turgeon et al, 2005;Bregman and Pinker, 1978;Micheyl et al, 2013b;Christiansen et al, 2014), in good agreement with the data from this study.…”
Recent studies of auditory streaming have suggested that repeated synchronous onsets and offsets over time, referred to as "temporal coherence," provide a strong grouping cue between acoustic components, even when they are spectrally remote. This study uses a measure of auditory stream formation, based on comodulation masking release (CMR), to assess the conditions under which a loss of temporal coherence across frequency can lead to auditory stream segregation. The measure relies on the assumption that the CMR, produced by flanking bands remote from the masker and target frequency, only occurs if the masking and flanking bands form part of the same perceptual stream. The masking and flanking bands consisted of sequences of narrowband noise bursts, and the temporal coherence between the masking and flanking bursts was manipulated in two ways: (a) By introducing a fixed temporal offset between the flanking and masking bands that varied from zero to 60 ms and (b) by presenting the flanking and masking bursts at different temporal rates, so that the asynchronies varied from burst to burst. The results showed reduced CMR in all conditions where the flanking and masking bands were temporally incoherent, in line with expectations of the temporal coherence hypothesis.
“…Previous studies have shown that onset/offset asynchronies larger than 20-40 ms lead to increased stream segregation (e.g., Turgeon et al, 2002;Turgeon et al, 2005;Bregman and Pinker, 1978;Micheyl et al, 2013b;Christiansen et al, 2014), in good agreement with the data from this study.…”
Recent studies of auditory streaming have suggested that repeated synchronous onsets and offsets over time, referred to as "temporal coherence," provide a strong grouping cue between acoustic components, even when they are spectrally remote. This study uses a measure of auditory stream formation, based on comodulation masking release (CMR), to assess the conditions under which a loss of temporal coherence across frequency can lead to auditory stream segregation. The measure relies on the assumption that the CMR, produced by flanking bands remote from the masker and target frequency, only occurs if the masking and flanking bands form part of the same perceptual stream. The masking and flanking bands consisted of sequences of narrowband noise bursts, and the temporal coherence between the masking and flanking bursts was manipulated in two ways: (a) By introducing a fixed temporal offset between the flanking and masking bands that varied from zero to 60 ms and (b) by presenting the flanking and masking bursts at different temporal rates, so that the asynchronies varied from burst to burst. The results showed reduced CMR in all conditions where the flanking and masking bands were temporally incoherent, in line with expectations of the temporal coherence hypothesis.
“…As demonstrated by Ewert and Dau (2000), the processing of envelope fluctuations can be described effectively by a second-order bandpass filterbank with logarithmically scaled modulation filters. Such a processing has recently also been successful in speech intelligibility prediction studies (Jørgensen and Dau, 2011;Jørgensen et al, 2013), computational scene analysis (Christiansen et al, 2014), and sound textures synthesis (McDermott and Simoncelli, 2011). Similar processing based on auditory coding principles might also be advantageous in computational speech segregation, but this has not yet been examined.…”
A monaural speech segregation system is presented that estimates the ideal binary mask from noisy speech based on the supervised learning of amplitude modulation spectrogram (AMS) features. Instead of using linearly scaled modulation filters with constant absolute bandwidth, an auditory-inspired modulation filterbank with logarithmically scaled filters is employed. To reduce the dependency of the AMS features on the overall background noise level, a feature normalization stage is applied. In addition, a spectro-temporal integration stage is incorporated in order to exploit the context information about speech activity present in neighboring time-frequency units. In order to evaluate the generalization performance of the system to unseen acoustic conditions, the speech segregation system is trained with a limited set of low signal-to-noise ratio (SNR) conditions, but tested over a wide range of SNRs up to 20 dB. A systematic evaluation of the system demonstrates that auditory-inspired modulation processing can substantially improve the mask estimation accuracy in the presence of stationary and fluctuating interferers.
“…Temporal coherence also explained why a few synchronous tone sequences perceptually pop-out even in the midst of a dense background of random tones4, and why prominent electroencephalogram responses to these synchronous tones emerge even in the absence of other distinguishing features such as global changes in signal power or local tone densities56. Finally, temporal coherence has also been demonstrated to play a role in co-modulation masking release78 and its dynamics have recently been imaged in the primary auditory cortex9.…”
Perception of segregated sources is essential in navigating cluttered acoustic environments. A basic mechanism to implement this process is the temporal coherence principle. It postulates that a signal is perceived as emitted from a single source only when all of its features are temporally modulated coherently, causing them to bind perceptually. Here we report on neural correlates of this process as rapidly reshaped interactions in primary auditory cortex, measured in three different ways: as changes in response rates, as adaptations of spectrotemporal receptive fields following stimulation by temporally coherent and incoherent tone sequences, and as changes in spiking correlations during the tone sequences. Responses, sensitivity and presumed connectivity were rapidly enhanced by synchronous stimuli, and suppressed by alternating (asynchronous) sounds, but only when the animals engaged in task performance and were attentive to the stimuli. Temporal coherence and attention are therefore both important factors in auditory scene analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.