2022
DOI: 10.1101/2022.09.06.506680
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Many but not all deep neural network audio models capture brain responses and exhibit correspondence between model stages and brain regions

Abstract: Deep neural networks are commonly used as models of the visual system, but are less explored in audition. Prior work provided examples of audio-trained neural networks that produced good predictions of auditory cortical fMRI responses and exhibited correspondence between model stages and brain regions, but left it unclear whether these results generalize to other neural network models. We evaluated brain-model correspondence for publicly available audio neural network models along with in-house models trained … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 8 publications
(10 citation statements)
references
References 186 publications
(476 reference statements)
0
10
0
Order By: Relevance
“…This dataset consists of ∼6 million 2-second speech excerpts superimposed on real-world background noise. Training on this dataset has previously been shown to produce models that yield the best current predictions of auditory cortical responses 79 . Models were jointly optimized to classify stimuli according to the word that appeared in the middle of the excerpt (794-way word recognition task) and the talker that produced the utterance (433-way voice recognition task).…”
Section: Resultsmentioning
confidence: 99%
“…This dataset consists of ∼6 million 2-second speech excerpts superimposed on real-world background noise. Training on this dataset has previously been shown to produce models that yield the best current predictions of auditory cortical responses 79 . Models were jointly optimized to classify stimuli according to the word that appeared in the middle of the excerpt (794-way word recognition task) and the talker that produced the utterance (433-way voice recognition task).…”
Section: Resultsmentioning
confidence: 99%
“…This may aid interpretation of encoding analyses [24, 23]. In recent years, encoding models have been used increasingly to model neural responses with features extracted from taskoptimized neural networks [7, 72, 73, 74]. This typically involves very high-dimensional feature sets and hence encoding models where the number of predictors is much larger than the number of observations, i.e., D >> M .…”
Section: Discussionmentioning
confidence: 99%
“…However, the significance of these visualizations for evaluating neural network models of biological sensory systems has received relatively little attention. One contributing factor may be that model visualizations have often been constrained by added natural image priors or other forms of regularization (77) that help make model visualizations look more natural, but mask the extent to which they otherwise diverge from a perceptually meaningful stimulus. For this reason, we intentionally avoided priors or other regularization when generating model metamers, as they defeat the purpose of the metamer test.…”
Section: Discussionmentioning
confidence: 99%
“…Thus, most of the variation in metamer recognizability was not captured by standard model-brain comparison benchmarks. We performed an analogous analysis for the auditory models, using a large data set of human auditory cortical responses (74) (fMRI responses to a large set of natural sounds) that had previously been used to evaluate neural network models of the auditory system (5,75). We analyzed voxel responses within four regions of interest in addition to all of auditory cortex, in each case again choosing the best-predicting model stage, measuring the variance it explained in held-out data, and comparing that to the recognizability of the metamers from that stage (Figure 7b).…”
Section: Current Model-brain Comparisons Do Not Capture Metamer Diffe...mentioning
confidence: 99%