The receptive fields of simple cells in mammalian primary visual cortex can be characterized as being spatially localized, oriented and bandpass (selective to structure at different spatial scales), comparable to the basis functions of wavelet transforms. One approach to understanding such response properties of visual neurons has been to consider their relationship to the statistical structure of natural images in terms of efficient coding. Along these lines, a number of studies have attempted to train unsupervised learning algorithms on natural images in the hope of developing receptive fields with similar properties, but none has succeeded in producing a full set that spans the image space and contains all three of the above properties. Here we investigate the proposal that a coding strategy that maximizes sparseness is sufficient to account for these properties. We show that a learning algorithm that attempts to find sparse linear codes for natural scenes will develop a complete family of localized, oriented, bandpass receptive fields, similar to those found in the primary visual cortex. The resulting sparse image code provides a more efficient representation for later stages of processing because it possesses a higher degree of statistical independence among its outputs.
The spatial receptive fields of simple cells in mammalian striate cortex have been reasonably well described physiologically and can be characterized as being localized, oriented, and bandpass, comparable with the basis functions of wavelet transforms. Previously, we have shown that these receptive field properties may be accounted for in terms of a strategy for producing a sparse distribution of output activity in response to natural images. Here, in addition to describing this work in a more expansive fashion, we examine the neurobiological implications of sparse coding. Of particular interest is the case when the code is overcomplete--i.e., when the number of code elements is greater than the effective dimensionality of the input space. Because the basis functions are non-orthogonal and not linearly independent of each other, sparsifying the code will recruit only those basis functions necessary for representing a given input, and so the input-output function will deviate from being purely linear. These deviations from linearity provide a potential explanation for the weak forms of non-linearity observed in the response properties of cortical simple cells, and they further make predictions about the expected interactions among units in response to naturalistic stimuli.
It has long been assumed that sensory neurons are adapted, through both evolutionary and developmental processes, to the statistical properties of the signals to which they are exposed. Attneave (1954)Barlow (1961) proposed that information theory could provide a link between environmental statistics and neural responses through the concept of coding efficiency. Recent developments in statistical modeling, along with powerful computational tools, have enabled researchers to study more sophisticated statistical models for visual images, to validate these models empirically against large sets of data, and to begin experimentally testing the efficient coding hypothesis for both individual neurons and populations of neurons.
We can claim that we know what the visual system does once we can predict neural responses to arbitrary stimuli, including those seen in nature. In the early visual system, models based on one or more linear receptive fields hold promise to achieve this goal as long as the models include nonlinear mechanisms that control responsiveness, based on stimulus context and history, and take into account the nonlinearity of spike generation. These linear and nonlinear mechanisms might be the only essential determinants of the response, or alternatively, there may be additional fundamental determinants yet to be identified. Research is progressing with the goals of defining a single "standard model" for each stage of the visual pathway and testing the predictive power of these models on the responses to movies of natural scenes. These predictive models represent, at a given stage of the visual pathway, a compact description of visual computation. They would be an invaluable guide for understanding the underlying biophysical and anatomical mechanisms and relating neural responses to visual perception.Key words: contrast; lateral geniculate nucleus; luminance; primary visual cortex; receptive field; retina; visual system; natural imagesThe ultimate test of our knowledge of the visual system is prediction: we can say that we know what the visual system does when we can predict its response to arbitrary stimuli. How far are we from this end result? Do we have a "standard model" that can predict the responses of at least some early part of the visual system, such as the retina, the lateral geniculate nucleus (LGN), or primary visual cortex (V1)? Does such a model predict responses to stimuli encountered in the real world?A standard model existed in the early decades of visual neuroscience, until the 1990s: it was given by the linear receptive field. The linear receptive field specifies a set of weights to apply to images to yield a predicted response. A weighted sum is a linear operation, so it is simple and intuitive. Moreover, linearity made the receptive field mathematically tractable, allowing the fruitful marriage of visual neuroscience with image processing (Robson, 1975) and with linear systems analysis (De Valois and De Valois, 1988). It also provided a promising parallel with research in visual perception (Graham, 1989). Because it served as a standard model, the receptive field could be used to decide which findings were surprising and which were not: if a phenomenon was not predictable from the linear receptive field, it was particularly worthy of publication.Research aimed at testing the linear receptive field led to the discovery of important nonlinear phenomena, which cannot be explained by a linear receptive field alone. These phenomena have been discovered at all stages of the early visual system, including the retina (for review, see Shapley and Enroth-Cugell, 1984;Demb, 2002), the LGN (for review, see Carandini, 2004), and area V1 (for review, see Carandini et al., 1999;Fitzpatrick, 2000;Albright and Stoner,...
While evidence indicates that neural systems may be employing sparse approximations to represent sensed stimuli, the mechanisms underlying this ability are not understood. We describe a locally competitive algorithm (LCA) that solves a collection of sparse coding principles minimizing a weighted combination of mean-squared error and a coefficient cost function. LCAs are designed to be implemented in a dynamical system composed of many neuron-like elements operating in parallel. These algorithms use thresholding functions to induce local (usually one-way) inhibitory competitions between nodes to produce sparse representations. LCAs produce coefficients with sparsity levels comparable to the most popular centralized sparse coding algorithms while being readily suited for neural implementation. Additionally, LCA coefficients for video sequences demonstrate inertial properties that are both qualitatively and quantitatively more regular (i.e., smoother and more predictable) than the coefficients produced by greedy algorithms.
Visual perception involves the grouping of individual elements into coherent patterns that reduce the descriptive complexity of a visual scene. The physiological basis of this perceptual simplification remains poorly understood. We used functional MRI to measure activity in a higher object processing area, the lateral occipital complex, and in primary visual cortex in response to visual elements that were either grouped into objects or randomly arranged. We observed significant activity increases in the lateral occipital complex and concurrent reductions of activity in primary visual cortex when elements formed coherent shapes, suggesting that activity in early visual areas is reduced as a result of grouping processes performed in higher areas. These findings are consistent with predictive coding models of vision that postulate that inferences of high-level areas are subtracted from incoming sensory information in lower areas through cortical feedback.O ne of the extraordinary capabilities of the human visual system is its ability to rapidly group elements in a complex visual scene, a process that can greatly simplify the description of an image. For example, a collection of parallel lines can be described as a single texture pattern without specifying the location, length, and orientation of each element within the pattern. Such grouping processes are reflected in the activities of neurons at various stages of the visual system. For example, the response of a neuron in primary visual cortex (V1) to a single visual element can be suppressed if the element in its receptive field shares the same orientation as surrounding elements, or enhanced if orientations differ (1). These pattern context effects in V1 are thought to be mediated by both local connections (2) and interactions with higher areas (3).In natural scenes, elements are often grouped when they are perceived as belonging to the same object. This case is particularly interesting from a physiological perspective because object shape is a feature that is represented only in higher stages of the visual system, so any influence of perceived shape on lower areas would require feedback processes. Although feedback is generally thought of as a process where activity in lower areas is enhanced by activity occurring in higher areas, recent work on probabilistic models has pointed to the importance of a phenomenon termed ''explaining away'': a competition that occurs between alternative hypotheses when attempting to infer the probable cause of an event (4). When applied to models of visual perception, perceptual hypotheses are thought to compete via feedback connections from higher visual areas projecting their predictions about the stimulus to lower stages, where they are then subtracted from incoming data. According to such predictive coding models, the activity of neurons in lower stages will decrease when neurons in higher stages can ''explain'' a visual stimulus (5, 6). These models can be contrasted with traditional feature-detection models, which posit that...
A wide variety of papers have reviewed what is known about the function of primary visual cortex. In this review, rather than stating what is known, we attempt to estimate how much is still unknown about V1 function. In particular, we identify five problems with the current view of V1 that stem largely from experimental and theoretical biases, in addition to the contributions of nonlinearities in the cortex that are not well understood. Our purpose is to open the door to new theories, a number of which we describe, along with some proposals for testing them.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.