The ventral visual stream underlies key human visual object recognition abilities. However, neural encoding in the higher areas of the ventral stream remains poorly understood. Here, we describe a modeling approach that yields a quantitatively accurate model of inferior temporal (IT) cortex, the highest ventral cortical area. Using high-throughput computational techniques, we discovered that, within a class of biologically plausible hierarchical neural network models, there is a strong correlation between a model's categorization performance and its ability to predict individual IT neural unit response data. To pursue this idea, we then identified a high-performing neural network that matches human performance on a range of recognition tasks. Critically, even though we did not constrain this model to match neural data, its top output layer turns out to be highly predictive of IT spiking responses to complex naturalistic images at both the single site and population levels. Moreover, the model's intermediate layers are highly predictive of neural responses in the V4 cortex, a midlevel visual area that provides the dominant cortical input to IT. These results show that performance optimization-applied in a biologically appropriate model classcan be used to build quantitative predictive models of neural processing.computational neuroscience | computer vision | array electrophysiology R etinal images of real-world objects vary drastically due to changes in object pose, size, position, lighting, nonrigid deformation, occlusion, and many other sources of noise and variation. Humans effortlessly recognize objects rapidly and accurately despite this enormous variation, an impressive computational feat (1). This ability is supported by a set of interconnected brain areas collectively called the ventral visual stream (2, 3), with homologous areas in nonhuman primates (4). The ventral stream is thought to function as a series of hierarchical processing stages (5-7) that encode image content (e.g., object identity and category) increasingly explicitly in successive cortical areas (1,8,9). For example, neurons in the lowest area, V1, are well described by Gabor-like edge detectors that extract rough object outlines (10), although the V1 population does not show robust tolerance to complex image transformations (9). Conversely, rapidly evoked population activity in top-level inferior temporal (IT) cortex can directly support realtime, invariant object categorization over a wide range of tasks (11,12). Midlevel ventral areas-such as V4, the dominant cortical input to IT-exhibit intermediate levels of object selectivity and variation tolerance (12-14).Significant progress has been made in understanding lower ventral areas such as V1, where conceptually compelling models have been discovered (10). These models are also quantitatively accurate and can predict response magnitudes of individual neuronal units to novel image stimuli. Higher ventral cortical areas, especially V4 and IT, have been much more difficult to understand. Al...
Mounting evidence suggests that “core object recognition,” the ability to rapidly recognize objects despite substantial appearance variation, is solved in the brain via a cascade of reflexive, largely feedforward computations that culminate in a powerful neuronal representation in the inferior temporal cortex. However, the algorithm that produces this solution remains little-understood. Here we review evidence ranging from individual neurons, to neuronal populations, to behavior, to computational models. We propose that understanding this algorithm will require using neuronal and psychophysical data to sift through many computational models, each based on building blocks of small, canonical sub-networks with a common functional goal.
Fueled by innovation in the computer vision and artificial intelligence communities, recent developments in computational neuroscience have used goal-driven hierarchical convolutional neural networks (HCNNs) to make strides in modeling neural single-unit and population responses in higher visual cortical areas. In this Perspective, we review the recent progress in a broader modeling context and describe some of the key technical innovations that have supported it. We then outline how the goal-driven HCNN approach can be used to delve even more deeply into understanding the development and organization of sensory cortical processing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.