Despite great efforts over several decades, our best models of primary visual cortex (V1) still predict neural responses quite poorly when probed with natural stimuli, highlighting our limited understanding of the nonlinear computations in V1. At the same time, recent advances in machine learning have shown that deep neural networks can learn highly nonlinear functions for visual information processing.Two approaches based on deep learning have recently been successfully applied to neural data: transfer learning for predicting neural activity in higher areas of the primate ventral stream and data-driven models to predict retina and V1 neural activity of mice. However, so far there exists no comparison between the two approaches and neither of them has been used to model the early primate visual system. Here, we test the ability of both approaches to predict neural responses to natural images in V1 of awake monkeys. We found that both deep learning approaches outperformed classical linearnonlinear and wavelet-based feature representations building on existing V1 encoding theories. On our dataset, transfer learning and data-driven models performed similarly, while the data-driven model employed a much simpler architecture. Thus, multi-layer CNNs set the new state of the art for predicting neural responses to natural images in primate V1. Having such good predictive in-silico models opens the door for quantitative studies of yet unknown nonlinear computations in V1 without being limited by the available experimental time.
Despite great efforts over several decades, our best models of primary visual cortex (V1) still predict spiking activity quite poorly when probed with natural stimuli, highlighting our limited understanding of the nonlinear computations in V1. Recently, two approaches based on deep learning have emerged for modeling these nonlinear computations: transfer learning from artificial neural networks trained on object recognition and data-driven convolutional neural network models trained end-to-end on large populations of neurons. Here, we test the ability of both approaches to predict spiking activity in response to natural images in V1 of awake monkeys. We found that the transfer learning approach performed similarly well to the data-driven approach and both outperformed classical linear-nonlinear and wavelet-based feature representations that build on existing theories of V1. Notably, transfer learning using a pre-trained feature space required substantially less experimental time to achieve the same performance. In conclusion, multi-layer convolutional neural networks (CNNs) set the new state of the art for predicting neural responses to natural images in primate V1 and deep features learned for object recognition are better explanations for V1 computation than all previous filter bank theories. This finding strengthens the necessity of V1 models that are multiple nonlinearities away from the image domain and it supports the idea of explaining early visual cortex based on high-level functional goals.
Deep neural networks (DNN) have set new standards at predicting responses of neural populations to visual input. Most such DNNs consist of a convolutional network (core) shared across all neurons which learns a representation of neural computation in visual cortex and a neuron-specific readout that linearly combines the relevant features in this representation. The goal of this paper is to test whether such a representation is indeed generally characteristic for visual cortex, i.e. generalizes between animals of a species, and what factors contribute to obtaining such a generalizing core. To push all non-linear computations into the core where the generalizing cortical features should be learned, we devise a novel readout that reduces the number of parameters per neuron in the readout by up to two orders of magnitude compared to the previous state-of-the-art. It does so by taking advantage of retinotopy and learns a Gaussian distribution over the neuron’s receptive field position. With this new readout we train our network on neural responses from mouse primary visual cortex (V1) and obtain a gain in performance of 7% compared to the previous state-of-the-art network. We then investigate whether the convolutional core indeed captures general cortical features by using the core in transfer learning to a different animal. When transferring a core trained on thousands of neurons from various animals and scans we exceed the performance of training directly on that animal by 12%, and outperform a commonly used VGG16 core pre-trained on imagenet by 33%. In addition, transfer learning with our data-driven core is more data-efficient than direct training, achieving the same performance with only 40% of the data. Our model with its novel readout thus sets a new state-of-the-art for neural response prediction in mouse visual cortex from natural images, generalizes between animals, and captures better characteristic cortical features than current task-driven pre-training approaches such as VGG16.
Visualizing features in deep neural networks (DNNs) can help understanding their computations. Many previous studies aimed to visualize the selectivity of individual units by finding meaningful images that maximize their activation. However, comparably little attention has been paid to visualizing to what image transformations units in DNNs are invariant. Here we propose a method to discover invariances in the responses of hidden layer units of deep neural networks. Our approach is based on simultaneously searching for a batch of images that strongly activate a unit while at the same time being as distinct from each other as possible. We find that even early convolutional layers in VGG-19 exhibit various forms of response invariance: near-perfect phase invariance in some units and invariance to local diffeomorphic transformations in others. At the same time, we uncover representational differences with ResNet-50 in its corresponding layers. We conclude that invariance transformations are a major computational component learned by DNNs and we provide a systematic method to study them.
Divisive normalization (DN) is a prominent computational building block in the brain that has been proposed as a canonical cortical operation. Numerous experimental studies have verified its importance for capturing nonlinear neural response properties to simple, artificial stimuli, and computational studies suggest that DN is also an important component for processing natural stimuli. However, we lack quantitative models of DN that are directly informed by measurements of spiking responses in the brain and applicable to arbitrary stimuli. Here, we propose a DN model that is applicable to arbitrary input images. We test its ability to predict how neurons in macaque primary visual cortex (V1) respond to natural images, with a focus on nonlinear response properties within the classical receptive field. Our model consists of one layer of subunits followed by learned orientation-specific DN. It outperforms linear-nonlinear and wavelet-based feature representations and makes a significant step towards the performance of state-of-the-art convolutional neural network (CNN) models. Unlike deep CNNs, our compact DN model offers a direct interpretation of the nature of normalization. By inspecting the learned normalization pool of our model, we gained insights into a long-standing question about the tuning properties of DN that update the current textbook description: we found that within the receptive field oriented features were normalized preferentially by features with similar orientation rather than non-specifically as currently assumed.
The antioxidant activity of food compounds is one of the properties generating the most interest, due to its health benefits and correlation with the prevention of chronic disease. This activity is usually measured using in vitro assays, which cannot predict in vivo effects or mechanisms of action. The objective of this study was to evaluate the in vivo protective effects of six phenolic compounds (naringenin, apigenin, rutin, oleuropein, chlorogenic acid, and curcumin) and three carotenoids (lycopene B, β-carotene, and astaxanthin) naturally present in foods using a zebrafish embryo model. The zebrafish embryo was pretreated with each of the nine antioxidant compounds and then exposed to tert-butyl hydroperoxide (tBOOH), a known inducer of oxidative stress in zebrafish. Significant differences were determined by comparing the concentration-response of the tBOOH induced lethality and dysmorphogenesis against the pretreated embryos with the antioxidant compounds. A protective effect of each compound, except β-carotene, against oxidative-stress-induced lethality was found. Furthermore, apigenin, rutin, and curcumin also showed protective effects against dysmorphogenesis. On the other hand, β-carotene exhibited increased lethality and dysmorphogenesis compared to the tBOOH treatment alone.
Responses to natural stimuli in area V4, a mid-level area of the visual ventral stream, are well predicted by features from convolutional neural networks (CNNs) trained on image classification. This result has been taken as evidence for the functional role of V4 in object classification. However, we currently do not know if and to what extent V4 plays a role in solving other computational objectives. Here, we investigated normative accounts of V4 by predicting macaque single-neuron responses to natural images from the representations extracted by 23 CNNs trained on different computer vision tasks including semantic, geometric, 2D, and 3D visual tasks. We found that semantic classification tasks do indeed provide the best predictive features for V4. Other tasks (3D in particular) followed very closely in performance, but a similar pattern of tasks performance emerged when predicting the activations of a network exclusively trained on object recognition. Thus, our results support V4's main functional role in semantic processing. At the same time, they suggest that V4's affinity to various 3D and 2D stimulus features found by electrophysiologists could be a corollary of a semantic functional goal.
1Deep convolutional neural networks (CNNs) have emerged as the state of the art 2 for predicting neural activity in visual cortex. While such models outperform classical 3 linear-nonlinear and wavelet-based representations, we currently do not know what 4 computations they approximate. Here, we tested divisive normalization (DN) for 5 its ability to predict spiking responses to natural images. We developed a model 6 that learns the pool of normalizing neurons and the magnitude of their contribution 7 end-to-end from data. In macaque primary visual cortex (V1), we found that 8 our interpretable model outperformed linear-nonlinear and wavelet-based feature 9 representations and almost closed the gap to high-performing black-box models.10 Surprisingly, within the classical receptive field, oriented features were normalized 11 preferentially by features with similar orientations rather than non-specifically as 12 currently assumed. Our work provides a new, quantitatively interpretable and 13 high-performing model of V1 applicable to arbitrary images, refining our view on 14 gain control within the classical receptive field. 15 53 some original experimental studies suggest that this assumption may not be correct for some 54 neurons (Bonds, 1989; DeAngelis et al., 1992), and normative models of normalization predict 55 that the magnitude with which a given neuron contributes to another neuron's normalization 56 depends on the relationship of their response properties (Schwartz and Simoncelli, 2001). 57 In this paper, we address two main questions raised above: (1) can an interpretable model based 58 on divisive normalization match the superior performance of black-box CNNs over simpler, 59 interpretable subunit or energy models when predicting spiking responses to natural images 60 and (2) how are V1 neurons normalized? We focus on responses to stimuli mostly restricted 61 to the classical receptive field and on models that account only for normalization by neurons 62 with overlapping receptive field locations. We developed an end-to-end trainable divisive 63 normalization model to predict V1 spike counts from natural stimuli. Our model learns the 64 filter coefficients of all neurons as well as their normalization weights directly from the data. 65 We applied our model to natural image responses in monkey V1 and found that it outperforms 66 linear-nonlinear and subunit models, and is competitive with that of state-of-the-art CNNs 67 while requiring much fewer parameters and being directly interpretable. This result implies 68 that divisive normalization is an important computation under stimulation with natural images. 69Importantly, we found that oriented features were normalized preferentially by features with 70 similar orientation, in contrast to the current standard model of nonspecific normalization 71 (Heeger, 1992; Busse et al., 2009). Our work thus advances our understanding of V1 function 72 by establishing a new state-of-the-art interpretable model and predicting an orientation-specific 73 divisive nor...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.