2022
DOI: 10.1038/s41562-021-01244-z
|View full text |Cite|
|
Sign up to set email alerts
|

Deep neural network models of sound localization reveal how perception is adapted to real-world environments

Abstract: Mammals localize sounds using information from their two ears. Localization in real-world conditions is challenging, as echoes provide erroneous information, and noises mask parts of target sounds. To better understand real-world localization we equipped a deep neural network with human ears and trained it to localize sounds in a virtual environment. The resulting model localized accurately in realistic conditions with noise and reverberation. In simulated experiments, the model exhibited many features of huma… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
50
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 42 publications
(58 citation statements)
references
References 141 publications
2
50
0
Order By: Relevance
“…These neurograms have been used for training ASRs to replicate human behavioral results. ASRs trained on neurograms from acoustic-hearing models have accurately predicted normal-hearing behavioral results for closed-set word recognition in noise [17], [18], pitch perception [19], and sound localization [20]. ASRs trained on CI outputs or neurograms from electric-stimulation models have predicted closed-set word recognition rates in noise for CI listeners and have provided some insights into factors that may underlie variability in CI outcomes, such as current spread, neural survival, and cognitive noise [21]- [23].…”
Section: Introductionmentioning
confidence: 95%
“…These neurograms have been used for training ASRs to replicate human behavioral results. ASRs trained on neurograms from acoustic-hearing models have accurately predicted normal-hearing behavioral results for closed-set word recognition in noise [17], [18], pitch perception [19], and sound localization [20]. ASRs trained on CI outputs or neurograms from electric-stimulation models have predicted closed-set word recognition rates in noise for CI listeners and have provided some insights into factors that may underlie variability in CI outcomes, such as current spread, neural survival, and cognitive noise [21]- [23].…”
Section: Introductionmentioning
confidence: 95%
“…The discrepancies shown here for model metamers contrast with a growing number of examples of compelling human-model similarities for behavioral judgments of natural stimuli. Models optimized for object recognition (90), speech recognition (5), sound localization (91), and pitch recognition (92) all exhibit qualitative and often quantitative similarities to human judgments when run in traditional psychophysical experiments with natural or relatively naturalistic stimuli. These results suggest that neural network models trained in naturalistic conditions often match human behavior for signals that fall within their training distribution, but not for some signals derived from the model that fall outside the distribution of natural sounds and images.…”
Section: Future Directionsmentioning
confidence: 99%
“…To reduce possible biases by a specific architecture, unless otherwise stated, the reported recognition accuracy, TMTFs, and neurophysiological similarities are averages of the results of the four models with the highest recognition accuracies. This could be considered as a modeled version of reporting average quantities in multiple participants in human studies (Francl and McDermott, 2022). Four architectures with 13 layers were selected by performing an architecture search (see the methods for the detailed procedure).…”
Section: Optimizing a Neural Network For Natural Sound Recognitionmentioning
confidence: 99%
“…Note that we did not directly model or try to reproduce human AM sensitivity in the optimization process. This kind of two-step optimization-and-analysis procedure has explained a number of properties of the auditory system, such as cochlear frequency tuning (Lewicki, 2002), AM tuning (Khatami and Escabí, 2020), the receptive field in the auditory cortex (Terashima and Okada, 2012), pitch perception (Saddler et al, 2021), sound localization (Francl and McDermott, 2022), and speech processing (Ashihara et al, 2021), as well as in other sensory modalities (Kriegeskorte and Douglas, 2018). To reduce the number of hard-coded assumptions and clarify the relationship between the optimization procedure and the emergent properties, we applied an NN directly to a “raw” sound waveform without any preprocessing (Hoshen et al, 2015; Tokozume and Harada, 2017).…”
Section: Introductionmentioning
confidence: 99%