2020
DOI: 10.1073/pnas.1912334117
|View full text |Cite
|
Sign up to set email alerts
|

Controversial stimuli: Pitting neural networks against each other as models of human cognition

Abstract: Distinct scientific theories can make similar predictions. To adjudicate between theories, we must design experiments for which the theories make distinct predictions. Here we consider the problem of comparing deep neural networks as models of human visual recognition. To efficiently compare models’ ability to predict human responses, we synthesize controversial stimuli: images for which different models produce distinct responses. We applied this approach to two visual recognition tasks, handwritten digits (M… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
69
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 77 publications
(78 citation statements)
references
References 24 publications
0
69
0
Order By: Relevance
“…It is important to note that finding such improved models remains necessary: Both for neural response pattern and behavioral consistency metrics, our results show that there remains a substantial gap between all models (supervised and unsupervised) and the noise ceiling of the data: there is reliable neural and behavioral variance that no model correctly predicts. These quantitative gaps may be related to other qualitative inconsistencies between neural network models and human visual behaviors, including the latter’s susceptibility to adversarial and “controversial” examples ( 70 ) and their different texture-vs.-shape biases ( 71 ).…”
Section: Discussionmentioning
confidence: 99%
“…It is important to note that finding such improved models remains necessary: Both for neural response pattern and behavioral consistency metrics, our results show that there remains a substantial gap between all models (supervised and unsupervised) and the noise ceiling of the data: there is reliable neural and behavioral variance that no model correctly predicts. These quantitative gaps may be related to other qualitative inconsistencies between neural network models and human visual behaviors, including the latter’s susceptibility to adversarial and “controversial” examples ( 70 ) and their different texture-vs.-shape biases ( 71 ).…”
Section: Discussionmentioning
confidence: 99%
“…One way to carry out this task is by examining the different experimental settings that make the model fail, known as adversarial examples, [24], which has a long tradition in cognitive psychology, from the use of visual illusions to study perception to the characterisation of biases in decision making [43]. Another method is to train many models with different goals, and to examine which models best describe human behaviour [23]. Here we used cognitive models that provide explicit predictions, reward-oriented and reward-oblivious models, to characterise the performance of our general DNN.…”
Section: Discussionmentioning
confidence: 99%
“…Several researchers used different approaches to overcome this problem. One approach was to train many different models with different goals, and examine how they perform in predicting human behaviour, thus controlling for the model’s goal [23], and another approach was to use adversarial examples that meant misleading a model and thus gaining insights on its operations [24]. We suggest another direction, which is to use tools from cognitive neuroscience, the same explicit cognitive models described above, to characterise and explain the operations of such a data-driven black box model.…”
Section: Introductionmentioning
confidence: 99%
“…For stimuli in Experiments 3a and 3b: Patterns of gloss constancy we selected two sets of 10 sequences for which (a) both the unsupervised and supervised models predicted deviations from constant gloss, and (b) the models made different predictions about the particular pattern of deviations (see Methods). The rationale behind this is that cases where models disagree provide the strongest power to test which model is superior 88, 89 .…”
Section: Resultsmentioning
confidence: 99%
“…To create a strong test of the different models, we wanted to probe human gloss constancy using stimuli (a) for which there were clear differences between constancy patterns predicted by unsupervised vs supervised models 88, 89 , and (b) which were likely to produce diverse patterns of failures of constancy in human observers. We therefore first generated candidate stimuli, then selected those that best satisfied these desiderata.…”
Section: Methodsmentioning
confidence: 99%