Examining the Coding Strength of Object Identity and Nonidentity Features in Human Occipito-Temporal Cortex and Convolutional Neural Networks

Xu, Yaoda; Vaziri-Pashkam, Maryam

doi:10.1523/jneurosci.1993-20.2021

Cited by 52 publications

(33 citation statements)

References 69 publications

(113 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition, DNNs often act in surprising non-human-like ways, such as being fooled by adversarial images (Szegedy et al, 2013;Dujmović et al, 2020) and make bizarre classification errors to familiar objects in unusual poses (Kauderer-Abrams, 2017;Gong et al, 2014;Chen et al, 2017). Furthermore, recent work by Xu and Vaziri-Pashkam (2021) failed to find strong neural correlates on high level visual areas when DNNs' internal representation were compared to fMRI data of human participants.…”

Section: Neural Network As a Model Of The Human Visual Systemmentioning

confidence: 98%

Mixed Evidence for Gestalt Grouping in Deep Neural Networks

Biscione¹,

Bowers²

2022

Preprint

View full text Add to dashboard Cite

Under some circumstances, humans tend to perceive individual elements as a group or 'whole'. This has been widely investigated for more than a century by the school of Gestalt Psychology, which formulated several laws of perceptual grouping. Recently, Deep Neural Networks (DNNs) trained on natural images have been proposed as compelling models of human vision based on reports that they learn internal representations similar to the primate ventral visual stream and show similar patterns of errors in object classification tasks. That is, DNNs often perform well on brain and behavioral benchmarks. Here we compared human and DNN responses in discrimination judgments that assess a range of Gestalt organization principles (Pomerantz et al., 1977;Pomerantz and Portillo, 2011). Amongst the DNNs tested we selected models that perform well on the Brain-Score benchmark (Schrimpf et al., 2018). We found that network trained on natural images exhibited sensitivity to shapes at the last stage of classification, which in some cases matched humans responses. When shape familiarity was controlled for (by using dot patterns that would not resemble shapes) we found the networks were insensitive to the standard Gestalt principles of proximity, orientation, and linearity, which have been shown to have a strong and robust effect on humans. This shows that models that perform well on behavioral and brain benchmarks nevertheless miss fundamental principles of human vision.

show abstract

Section: Neural Network As a Model Of The Human Visual Systemmentioning

confidence: 98%

Mixed Evidence for Gestalt Grouping in Deep Neural Networks

Biscione¹,

Bowers²

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…We believe that this is due to their reliance of Euclidean distance which underestimated the degree of scale invariance and subsequently (since the transformations were aggregated) the strength of all invariances. Also Xu and Vaziri-Pashkam (2021) found inconsistent embeddings across classes of translated objects, which we believe to be due to the different training setup they employed. We expand on this in the Supplementary Material.…”

Section: Discussionmentioning

confidence: 91%

Learning Online Visual Invariances for Novel Objects via Supervised and Self-Supervised Training

Biscione,

Bowers

2021

Preprint

View full text Add to dashboard Cite

CNNs can be trained to acquire online-invariance: the same internal representation is elicited following different transformations of the same, novel objects• We demonstrate on-line invariance for the following transformations: translation, rotation, scale, brightness, contrast, and viewpoint• Many different supervised networks acquire this property, as does a self-supervised network that solves the same/different task• As few as 50 images taken 10 object classes are needed to train for online invariance

show abstract

“…The layers were also chosen in such a manner as to sample the network as evenly as possible, and to at least roughly equate the number of layers extracted from each network. In a control analysis, we found that our sampled layers capture the overall processing trajectory of the network and that the trajectory does not change with the types of layers sampled, as long as they are adjacent to each other in the processing pipeline (S1 Fig) . Although fully-connected layers (including the classification layer) differ from early layers in the network in that they do not follow a weight-sharing constraint over space, past work has found that they encode not just information about object category membership, but also information about features such as shape, position, spatial frequency, and size [19,36], making it appropriate to examine how they jointly encode the features of shape and color at the end of CNN visual processing.…”

Section: Resultsmentioning

confidence: 99%

“…A priori, one would assume that the final fully connected layer encodes object category orthogonally to color, since it is trained to output category labels. However, prior work has shown that fully connected layers encode not just information about object category membership, but also information about features such as shape, position, spatial frequency, and size [ 19 , 36 ]. The present results further show that there is both a significant amount of color representation and a greater amount of color and form interaction in the final compared to the first sampled layer, with the amount of interaction steadily increasing during the course of visual processing.…”

Section: Discussionmentioning

confidence: 99%

“…This includes understanding how different types of visual features are represented together in CNNs during the course of visual processing. Several studies have examined how individual features are encoded in CNNs, with some finding that coding for object identity-irrelevant features increases in higher CNN layers [ 2 , 19 ]. Additional approaches to understanding internal CNN representations (summarized in [ 9 , 20 ]) include synthesizing images that maximally drive individual CNN units (e.g., [ 21 ]), ablating sets of units and examining how this impairs network performance (e.g., [ 22 ]), and using principal components analysis to visualize how different features are encoded in a given layer (e.g., [ 23 ]).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Joint representation of color and form in convolutional neural networks: A stimulus-rich network perspective

Taylor

2021

PLoS ONE

Self Cite

View full text Add to dashboard Cite

To interact with real-world objects, any effective visual system must jointly code the unique features defining each object. Despite decades of neuroscience research, we still lack a firm grasp on how the primate brain binds visual features. Here we apply a novel network-based stimulus-rich representational similarity approach to study color and form binding in five convolutional neural networks (CNNs) with varying architecture, depth, and presence/absence of recurrent processing. All CNNs showed near-orthogonal color and form processing in early layers, but increasingly interactive feature coding in higher layers, with this effect being much stronger for networks trained for object classification than untrained networks. These results characterize for the first time how multiple basic visual features are coded together in CNNs. The approach developed here can be easily implemented to characterize whether a similar coding scheme may serve as a viable solution to the binding problem in the primate brain.

show abstract

Examining the Coding Strength of Object Identity and Nonidentity Features in Human Occipito-Temporal Cortex and Convolutional Neural Networks

Cited by 52 publications

References 69 publications

Mixed Evidence for Gestalt Grouping in Deep Neural Networks

Mixed Evidence for Gestalt Grouping in Deep Neural Networks

Learning Online Visual Invariances for Novel Objects via Supervised and Self-Supervised Training

Joint representation of color and form in convolutional neural networks: A stimulus-rich network perspective

Contact Info

Product

Resources

About