Sound Texture Perception via Statistics of the Auditory Periphery: Evidence from Sound Synthesis

McDermott, Josh H.; Simoncelli, Eero P.

doi:10.1016/j.neuron.2011.06.032

Cited by 308 publications

(566 citation statements)

References 58 publications

Supporting

Mentioning

548

Contrasting

Unclassified

Order By: Relevance

“…The statistics were measured from a model of the auditory periphery ( Fig. 1a) and synthetic textures were generated by adjusting a 5-s sample of random noise until it attained the same values of these statistics 9 .…”

Section: Resultsmentioning

confidence: 99%

“…Sound texture stimuli were synthesized using a previously published method 9 . Statistics were first measured in 7-s recordings of real-world sound textures processed in an auditory model (Fig.…”

Section: Synthetic Texturesmentioning

confidence: 99%

“…Previously, we found that statistical measurements could be used to synthesize realistic textures: sounds generated to match the statistics of real-world texture recordings (rain, fire, wind, insect swarms, etc.) often sounded like new examples of the original recording 8,9 (Fig. 1b).…”

mentioning

confidence: 96%

“…measurements: time averages of short-term acoustic characteristics, which summarize the qualities of a sound 8,9 . Such time averages might be measured by the auditory system following peripheral filtering operations (Fig.…”

mentioning

confidence: 99%

See 3 more Smart Citations

Summary statistics in auditory perception

2013

View full text Add to dashboard Cite

Sensory signals are transduced at high resolution, but their structure must be stored in a more compact format. Here we provide evidence that the auditory system summarizes the temporal details of sounds using time-averaged statistics. We measured discrimination of 'sound textures' that were characterized by particular statistical properties, as normally result from the superposition of many acoustic features in auditory scenes. When listeners discriminated examples of different textures, performance improved with excerpt duration. In contrast, when listeners discriminated different examples of the same texture, performance declined with duration, a paradoxical result given that the information available for discrimination grows with duration. These results indicate that once these sounds are of moderate length, the brain's representation is limited to time-averaged statistics, which, for different examples of the same texture, converge to the same values with increasing duration. Such statistical representations produce good categorical discrimination, but limit the ability to discern temporal detail.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Synthetic Texturesmentioning

confidence: 99%

mentioning

confidence: 96%

mentioning

confidence: 99%

See 2 more Smart Citations

Summary statistics in auditory perception

2013

View full text Add to dashboard Cite

show abstract

“…We evaluated the enhancement algorithm described above on mixtures of 10 speech files by different speakers (5 male and 5 female) from the TIMIT test set with 15 environmental texture sounds from [17] at 3 different input signal-to-noise ratios (SNR), for a total of 450 mixtures. The training and test sets had disjoint sets of speakers.…”

Section: Enhancement Resultsmentioning

confidence: 99%

Non-negative dynamical system with application to speech and audio

Févotte

Roux

Hershey

2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

Non-negative data arise in a variety of important signal processing domains, such as power spectra of signals, pixels in images, and count data. This paper introduces a novel non-negative dynamical system (NDS) for sequences of such data, and describes its application to modeling speech and audio power spectra. The NDS model can be interpreted both as an adaptation of linear dynamical systems (LDS) to non-negative data, and as an extension of non-negative matrix factorization (NMF) to support Markovian dynamics. Learning and inference algorithms were derived and experiments on speech enhancement were conducted by training sparse non-negative dynamical systems on speech data and adapting a noise model to the unknown noise condition. Results show that the model can capture the dynamics of speech in a useful way. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved. ABSTRACT Non-negative data arise in a variety of important signal processing domains, such as power spectra of signals, pixels in images, and count data. This paper introduces a novel non-negative dynamical system (NDS) for sequences of such data, and describes its application to modeling speech and audio power spectra. The NDS model can be interpreted both as an adaptation of linear dynamical systems (LDS) to non-negative data, and as an extension of non-negative matrix factorization (NMF) to support Markovian dynamics. Learning and inference algorithms were derived and experiments on speech enhancement were conducted by training sparse non-negative dynamical systems on speech data and adapting a noise model to the unknown noise condition. Results show that the model can capture the dynamics of speech in a useful way.Index Terms-non-negative dynamical system (NDS), linear dynamical system (LDS), multiplicative innovations, non-negative matrix factorization (NMF), source separation.

show abstract

Audition

McDermott

2018

Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience

View full text Add to dashboard Cite

Audition is the process by which organisms use sound to derive information about the world. This chapter aims to provide a bird's‐eye view of contemporary audition research, spanning systems and cognitive neuroscience as well as cognitive science. I provide brief overviews of classic areas of research as well as some central themes and advances from the past 10 years. The chapter covers the sensory transduction of the cochlea, subcortical and cortical functional organization, amplitude modulation and its measurement in the auditory system, the perception of sound sources (with a focus on the classic research areas of location, loudness, and pitch), and auditory scene analysis (including segregation, streaming, texture, and reverberation perception).

show abstract

Sound Texture Perception via Statistics of the Auditory Periphery: Evidence from Sound Synthesis

Cited by 308 publications

References 58 publications

Summary statistics in auditory perception

Summary statistics in auditory perception

Non-negative dynamical system with application to speech and audio

Audition

Contact Info

Product

Resources

About