Mammals localize sounds using information from their two ears.
Localization in real-world conditions is challenging, as echoes provide
erroneous information, and noises mask parts of target sounds. To better
understand real-world localization we equipped a deep neural network with human
ears and trained it to localize sounds in a virtual environment. The resulting
model localized accurately in realistic conditions with noise and reverberation.
In simulated experiments, the model exhibited many features of human spatial
hearing: sensitivity to monaural spectral cues and interaural time and level
differences, integration across frequency, biases for sound onsets, and limits
on localization of concurrent sources. But when trained in unnatural
environments without either reverberation, noise, or natural sounds, these
performance characteristics deviated from those of humans. The results show how
biological hearing is adapted to the challenges of real-world environments and
illustrate how artificial neural networks can reveal the real-world constraints
that shape perception.
The insideness problem is an aspect of image segmentation that consists of determining which pixels are inside and outside a region. Deep neural networks (DNNs) excel in segmentation benchmarks, but it is unclear if they have the ability to solve the insideness problem as it requires evaluating long-range spatial dependencies. In this letter, we analyze the insideness problem in isolation, without texture or semantic cues, such that other aspects of segmentation do not interfere in the analysis. We demonstrate that DNNs for segmentation with few units have sufficient complexity to solve the insideness for any curve. Yet such DNNs have severe problems with learning general solutions. Only recurrent networks trained with small images learn solutions that generalize well to almost any curve. Recurrent networks can decompose the evaluation of long-range dependencies into a sequence of local operations, and learning with small images alleviates the common difficulties of training recurrent networks with a large number of unrolling steps.
Mammals localize sounds using information from their two ears. Localization in real-world conditions is challenging, as echoes provide erroneous information, and noises mask parts of target sounds. To better understand real-world localization we equipped a deep neural network with human ears and trained it to localize sounds in a virtual environment. The resulting model localized accurately in realistic conditions with noise and reverberation, outperforming alternative systems that lacked human ears. In simulated experiments, the network exhibited many features of human spatial hearing: sensitivity to monaural spectral cues and interaural time and level differences, integration across frequency, and biases for sound onsets. But when trained in unnatural environments without either reverberation, noise, or natural sounds, these performance characteristics deviated from those of humans. The results show how biological hearing is adapted to the challenges of real-world environments and illustrate how artificial neural networks can extend traditional ideal observer models to real-world domains.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.