A challenging goal in neuroscience is to be able to read out, or decode, mental content from brain activity. Recent functional magnetic resonance imaging (fMRI) studies have decoded orientation 1,2 , position 3 , and object category 4,5 from activity in visual cortex. However, these studies typically used relatively simple stimuli (e.g. gratings) or images drawn from fixed categories (e.g. faces, houses), and decoding was based on prior measurements of brain activity evoked by those same stimuli or categories. To overcome these limitations, we develop a decoding method based on quantitative receptive field models that characterize the relationship between visual stimuli and fMRI activity in early visual areas. These models describe the tuning of individual voxels for space, orientation, and spatial frequency, and are estimated directly from responses evoked by natural images. We show that these receptive field models make it possible to identify, from a large set of completely novel natural images, which specific image was seen by an observer. Identification is not a mere consequence of the retinotopic organization of visual areas; simpler receptive field models that describe only spatial tuning yield much poorer identification performance. Our results suggest that it may soon be possible to reconstruct a picture of a person's visual experience from brain activity measurements alone.
The meaning of language is represented in regions of the cerebral cortex collectively known as the “semantic system”. However, little of the semantic system has been mapped comprehensively, and the semantic selectivity of most regions is unknown. Here we systematically map semantic selectivity across the cortex using voxel-wise modeling of fMRI data collected while subjects listened to hours of narrative stories. We show that the semantic system is organized into intricate patterns that appear consistent across individuals. We then use a novel generative model to create a detailed semantic atlas. Our results suggest that most areas within the semantic system represent information about specific semantic domains, or groups of related concepts, and our atlas shows which domains are represented in each area. This study demonstrates that data-driven methods—commonplace in studies of human neuroanatomy and functional connectivity—provide a powerful and efficient means for mapping functional representations in the brain.
Theoretical studies suggest that primary visual cortex (area V1) uses a sparse code to efficiently represent natural scenes. This issue was investigated by recording from V1 neurons in awake behaving macaques during both free viewing of natural scenes and conditions simulating natural vision. Stimulation of the nonclassical receptive field increases the selectivity and sparseness of individual V1 neurons, increases the sparseness of the population response distribution, and strongly decorrelates the responses of neuron pairs. These effects are due to both excitatory and suppressive modulation of the classical receptive field by the nonclassical receptive field and do not depend critically on the spatiotemporal structure of the stimuli. During natural vision, the classical and nonclassical receptive fields function together to form a sparse representation of the visual world. This sparse code may be computationally efficient for both early vision and higher visual processing.
Summary Humans can see and name thousands of distinct object and action categories, so it is unlikely that each category is represented in a distinct brain area. A more efficient scheme would be to represent categories as locations in a continuous semantic space mapped smoothly across the cortical surface. To search for such a space, we used functional magnetic resonance imaging (fMRI) to measure human brain activity evoked by natural movies. We then used voxel-wise models to examine the cortical representation of 1705 object and action categories. The first few dimensions of the underlying semantic space were recovered from the fit models by principal components analysis. Projection of the recovered semantic space onto cortical flat maps shows that semantic selectivity is organized into smooth gradients that cover much of visual and non-visual cortex. Furthermore, both the recovered semantic space and the cortical organization of the space are shared across different individuals.
Over the past decade fMRI researchers have developed increasingly sensitive techniques for analyzing the information represented in BOLD activity. The most popular of these techniques is linear classification, a simple technique for decoding information about experimental stimuli or tasks from patterns of activity across an array of voxels. A more recent development is the voxel-based encoding model, which describes the information about the stimulus or task that is represented in the activity of single voxels. Encoding and decoding are complementary operations: encoding uses stimuli to predict activity while decoding uses activity to predict information about stimuli. However, in practice these two operations are often confused, and their respective strengths and weaknesses have not been made clear. Here we use the concept of a linearizing feature space to clarify the relationship between encoding and decoding. We show that encoding and decoding operations can both be used to investigate some of the most common questions about how information is represented in the brain. However, focusing on encoding models offers two important advantages over decoding. First, an encoding model can in principle provide a complete functional description of a region of interest, while a decoding model can provide only a partial description. Second, while it is straightforward to derive an optimal decoding model from an encoding model it is much more difficult to derive an encoding model from a decoding model. We propose a systematic modeling approach that begins by estimating an encoding model for every voxel in a scan and ends by using the estimated encoding models to perform decoding.
Summary Quantitative modeling of human brain activity can provide crucial insights about cortical representations [1, 2], and can form the basis for brain decoding devices [3–5]. Recent functional magnetic resonance imaging (fMRI) studies have modeled brain activity elicited by static visual patterns, and have shown that it is possible to reconstruct these images from brain activity measurements [6–8]. However, blood oxygen level dependent (BOLD) signals measured using fMRI are very slow [9], so it has been difficult to model brain activity elicited by dynamic stimuli such as natural movies. Here we present a new motion-energy [10, 11] encoding model that largely overcome this limitation. Our motion-energy model describes fast visual information and slow hemodynamics by separate components. We recorded BOLD signals in occipito-temporal visual cortex of human subjects who passively watched natural movies, and fit the encoding model separately to individual voxels. Visualization of the fit models reveals how early visual areas represent moving stimuli. To demonstrate the power of our approach we also constructed a Bayesian decoder [8], by combining estimated encoding models with a sampled natural movie prior. The decoder provides remarkable reconstructions of natural movies, capturing the spatio-temporal structure of the viewed movie. These results demonstrate that dynamic brain activity measured under naturalistic conditions can be decoded using current fMRI technology.
We can claim that we know what the visual system does once we can predict neural responses to arbitrary stimuli, including those seen in nature. In the early visual system, models based on one or more linear receptive fields hold promise to achieve this goal as long as the models include nonlinear mechanisms that control responsiveness, based on stimulus context and history, and take into account the nonlinearity of spike generation. These linear and nonlinear mechanisms might be the only essential determinants of the response, or alternatively, there may be additional fundamental determinants yet to be identified. Research is progressing with the goals of defining a single "standard model" for each stage of the visual pathway and testing the predictive power of these models on the responses to movies of natural scenes. These predictive models represent, at a given stage of the visual pathway, a compact description of visual computation. They would be an invaluable guide for understanding the underlying biophysical and anatomical mechanisms and relating neural responses to visual perception.Key words: contrast; lateral geniculate nucleus; luminance; primary visual cortex; receptive field; retina; visual system; natural imagesThe ultimate test of our knowledge of the visual system is prediction: we can say that we know what the visual system does when we can predict its response to arbitrary stimuli. How far are we from this end result? Do we have a "standard model" that can predict the responses of at least some early part of the visual system, such as the retina, the lateral geniculate nucleus (LGN), or primary visual cortex (V1)? Does such a model predict responses to stimuli encountered in the real world?A standard model existed in the early decades of visual neuroscience, until the 1990s: it was given by the linear receptive field. The linear receptive field specifies a set of weights to apply to images to yield a predicted response. A weighted sum is a linear operation, so it is simple and intuitive. Moreover, linearity made the receptive field mathematically tractable, allowing the fruitful marriage of visual neuroscience with image processing (Robson, 1975) and with linear systems analysis (De Valois and De Valois, 1988). It also provided a promising parallel with research in visual perception (Graham, 1989). Because it served as a standard model, the receptive field could be used to decide which findings were surprising and which were not: if a phenomenon was not predictable from the linear receptive field, it was particularly worthy of publication.Research aimed at testing the linear receptive field led to the discovery of important nonlinear phenomena, which cannot be explained by a linear receptive field alone. These phenomena have been discovered at all stages of the early visual system, including the retina (for review, see Shapley and Enroth-Cugell, 1984;Demb, 2002), the LGN (for review, see Carandini, 2004), and area V1 (for review, see Carandini et al., 1999;Fitzpatrick, 2000;Albright and Stoner,...
Summary Recent studies have used fMRI signals from early visual areas to reconstruct simple geometric patterns. Here, we demonstrate a new Bayesian decoder that uses fMRI signals from early and anterior visual areas to reconstruct complex natural images. Our decoder combines three elements: a structural encoding model that characterizes responses in early visual areas; a semantic encoding model that characterizes responses in anterior visual areas; and prior information about the structure and semantic content of natural images. By combining all these elements, the decoder produces reconstructions that accurately reflect both the spatial structure and semantic category of the objects contained in the observed natural image. Our results show that prior information has a substantial effect on the quality of natural image reconstructions. We also demonstrate that much of the variance in the responses of anterior visual areas to complex natural images is explained by the semantic category of the image alone.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.