The primate visual system achieves remarkable visual object recognition performance even in brief presentations, and under changes to object exemplar, geometric transformations, and background variation (a.k.a. core visual object recognition). This remarkable performance is mediated by the representation formed in inferior temporal (IT) cortex. In parallel, recent advances in machine learning have led to ever higher performing models of object recognition using artificial deep neural networks (DNNs). It remains unclear, however, whether the representational performance of DNNs rivals that of the brain. To accurately produce such a comparison, a major difficulty has been a unifying metric that accounts for experimental limitations, such as the amount of noise, the number of neural recording sites, and the number of trials, and computational limitations, such as the complexity of the decoding classifier and the number of classifier training examples. In this work, we perform a direct comparison that corrects for these experimental limitations and computational considerations. As part of our methodology, we propose an extension of “kernel analysis” that measures the generalization accuracy as a function of representational complexity. Our evaluations show that, unlike previous bio-inspired models, the latest DNNs rival the representational performance of IT cortex on this visual object recognition task. Furthermore, we show that models that perform well on measures of representational performance also perform well on measures of representational similarity to IT, and on measures of predicting individual IT multi-unit responses. Whether these DNNs rely on computational mechanisms similar to the primate visual system is yet to be determined, but, unlike all previous bio-inspired models, that possibility cannot be ruled out merely on representational performance grounds.
12Human visual object recognition is subserved by a multitude of cortical areas. To make sense 13 of this system, one line of research focused on response properties of primary visual cortex 14 neurons and developed theoretical models of a set of canonical computations such as convolution,
15thresholding, exponentiating and normalization that could be hierarchically repeated to give 16 rise to more complex representations. Another line or research focused on response properties 17 of high-level visual cortex and linked these to semantic categories useful for object recognition. (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
In the version of this article originally published, there was an error in the phrase "This dataset contained 1,739 cases (27 cancer-positives)" in the main text. The number 1,739 should have been 1,139. There was also an error in the Fig. 4c legend. In the phrase "comprising n = 1,739 cases", the number 1,739 again should have been 1,139. Additionally, in the Extended Data Fig. 5 legend, the phrase "AUC curve for the independent data test set with n = 1,739 cases" contained the same error. The number should have been 1,139 instead of 1,739. The errors have been corrected in the HTML and PDF versions of this article.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.