Computational auditory scene analysis

Brown, Guy J.; Cooke, Martin

doi:10.1006/csla.1994.1016

Cited by 462 publications

(262 citation statements)

References 0 publications

Supporting

Mentioning

255

Contrasting

Unclassified

Order By: Relevance

“…Recently, neural oscillator models have been successful at providing accounts of the interaction of cue combinations, such as common onset and proximity (Brown and Cooke, 1994;Wang and Brown, 1999), in which the summary correlogram model was also employed as a front end.…”

Section: Correlogram-based Casa Modelsmentioning

confidence: 99%

“…The instantaneous Hilbert envelope is computed at the output of each Gammatone filter. This is smoothed by a first-order lowpass filter with an 8 ms time constant, sampled at 10 ms intervals, and finally log-compressed to give an approximation to the auditory nerve firing ratea 'ratemap' (Brown and Cooke, 1994). Fig.…”

Section: Summary Of the Papermentioning

confidence: 99%

“…Many researchers have proposed automatic sound separation systems based on the known principles of human hearing and have achieved some success (Brown and Cooke, 1994;Wang and Brown, 1999;Ellis, 1999). A good review of CASA development is reported in Brown and Wang (2005).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Exploiting correlogram structure for robust speech recognition with multiple speech sources

Green

Barker

et al. 2007

Speech Communication

View full text Add to dashboard Cite

show abstract

Section: Correlogram-based Casa Modelsmentioning

confidence: 99%

Section: Summary Of the Papermentioning

confidence: 99%

See 1 more Smart Citation

Exploiting correlogram structure for robust speech recognition with multiple speech sources

Green

Barker

et al. 2007

Speech Communication

View full text Add to dashboard Cite

show abstract

Section: Introductionmentioning

confidence: 99%

“…The essence of this approach is that we can produce simple definitions of what we are looking for -sinusoids of unknown frequency, or narrowband noise energy -and we can then go through a given signal identifying and extracting just the parts that interest us, and ignoring the rest -in analogy to the way a human listener is able to 'screen out' interfering sounds that are not of interest. I consider the early models of Cooke and Brown [Cooke 1991/3, Brown 1992, Brown & Cooke 1994] to fall into this category.We are now at the third evolutionary stage, and in this paper I will describe one view of its defining characteristics. Based on efforts to overcome the limitations of the 'optimistic' view, we might call this 'realistic' or "structured, obstructive background" approach; the key insight is that it will not always be possible to extract a signal from interference in a unique or optimal way, but rather it is necessary to bring to bear a wide range of contextual constraints and prior biases in a heuristic search for an account of the signal that is at least reasonably satisfactory.…”

mentioning

confidence: 99%

Using knowledge to organize sound: The prediction-driven approach to computational auditory scene analysis and its application to speech/nonspeech mixtures

Ellis

1999

Speech Communication

View full text Add to dashboard Cite

Computational auditory scene analysis -modeling the human ability to organize sound mixtures according to their sources -has experienced a rapid evolution as the simple principles suggested by psychological experiments have turned out to be less than the whole story. Phenomena such as the continuity illusion and phonemic restoration show that the brain is able to use a wide range of knowledge-based contextual constraints when interpreting obscured or complex mixtures: To model such processing, we need architectures that operate by confirming hypotheses about the observations rather than relying on directly-extracted descriptions. One such architecture, the 'prediction-driven' approach, is presented along with results from its initial implementation. This architecture can be extended to take advantage of the high-level knowledge implicit in today's speech recognizers by modifying a recognizer to act as one of the 'component models' which provide the explanations of the signal mixture. Although this adaptation raises a number of issues, a preliminary investigation supports the argument that successful scene analysis must exploit such abstract knowledge at every level. IntroductionThe work described in this paper fits into a kind of evolutionary tale of approaches to sound organization: In the beginning, there was the 'simplistic' or "blank background" view that sound objects somehow defined themselves, and that identifying a single perceptual object was as simple as picking out a figure in a child's coloring book. The experimental stimuli on which so much of our understanding of auditory organization is based -the sinusoids and bandlimited noise bursts of Bregman [1990] and others -echo this approach, since, as presented in soundproof listening booths, they would actually be amenable to such an approach.The second stage of evolution, which we might call the 'optimistic' or "uniform background" view, emerged from the initial efforts to apply the insights of experimental results in auditory organization (especially those in [Bregman 1990]) and apply them to real sounds.Unlike sinusoids against a silent background, real sounds contain all kinds of noise and distractions to defeat simple extraction routines, and therefore demand a more sophisticated approach. However, the signal processing community is long used to dealing with noise and offers various approaches for making the best possible decisions under some simple, but useful, assumptions. These amount to a kind of template matching, such that if the form of the target and the interference can be exactly specified, the parameters of the target can be recovered in the mathematically best-possible fashion. The essence of this approach is that we can produce simple definitions of what we are looking for -sinusoids of unknown frequency, or narrowband noise energy -and we can then go through a given signal identifying and extracting just the parts that interest us, and ignoring the rest -in analogy to the way a human listener is able to 'screen out' interferin...

show abstract

Automatic Speech and Speaker Recognition

Keshet¹,

Bengio²

2009

View full text Add to dashboard Cite

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

show abstract

Computational auditory scene analysis

Cited by 462 publications

References 0 publications

Exploiting correlogram structure for robust speech recognition with multiple speech sources

Exploiting correlogram structure for robust speech recognition with multiple speech sources

Using knowledge to organize sound: The prediction-driven approach to computational auditory scene analysis and its application to speech/nonspeech mixtures

Automatic Speech and Speaker Recognition

Contact Info

Product

Resources

About