Previous studies have shown that spike-timing-dependent plasticity (STDP) can be used in spiking neural networks (SNN) to extract visual features of low or intermediate complexity in an unsupervised manner. These studies, however, used relatively shallow architectures, and only one layer was trainable. Another line of research has demonstrated - using rate-based neural networks trained with back-propagation - that having many layers increases the recognition robustness, an approach known as deep learning. We thus designed a deep SNN, comprising several convolutional (trainable with STDP) and pooling layers. We used a temporal coding scheme where the most strongly activated neurons fire first, and less activated neurons fire later or not at all. The network was exposed to natural images. Thanks to STDP, neurons progressively learned features corresponding to prototypical patterns that were both salient and frequent. Only a few tens of examples per category were required and no label was needed. After learning, the complexity of the extracted features increased along the hierarchy, from edge detectors in the first layer to object prototypes in the last layer. Coding was very sparse, with only a few thousands spikes per image, and in some cases the object category could be reasonably well inferred from the activity of a single higher-order neuron. More generally, the activity of a few hundreds of such neurons contained robust category information, as demonstrated using a classifier on Caltech 101, ETH-80, and MNIST databases. We also demonstrate the superiority of STDP over other unsupervised techniques such as random crops (HMAX) or auto-encoders. Taken together, our results suggest that the combination of STDP with latency coding may be a key to understanding the way that the primate visual system learns, its remarkable processing speed and its low energy consumption. These mechanisms are also interesting for artificial vision systems, particularly for hardware solutions.
1 In recent years, deep learning has revolutionized the field of machine learning, for computer vision in particular. In this approach, a deep (multilayer) artificial neural network (ANN) is trained in a supervised manner using backpropagation. Vast amounts of labeled training examples are required, but the resulting classification accuracy is truly impressive, sometimes outperforming humans. Neurons in an ANN are characterized by a single, static, continuous-valued activation. Yet biological neurons use discrete spikes to compute and transmit information, and the spike times, in addition to the spike rates, matter. Spiking neural networks (SNNs) are thus more biologically realistic than ANNs, and arguably the only viable option if one wants to understand how the brain computes. SNNs are also more hardware friendly and energy-efficient than ANNs, and are thus appealing for technology, especially for portable devices. However, training deep SNNs remains a challenge. Spiking neurons' transfer function is usually non-differentiable, which prevents using backpropagation.Here we review recent supervised and unsupervised methods to train deep SNNs, and compare them in terms of accuracy, but also computational cost and hardware friendliness. The emerging picture is that SNNs still lag behind ANNs in terms of accuracy, but the gap is decreasing, and can even vanish on some tasks, while SNNs typically require many fewer operations.
Reinforcement learning (RL) has recently regained popularity with major achievements such as beating the European game of Go champion. Here, for the first time, we show that RL can be used efficiently to train a spiking neural network (SNN) to perform object recognition in natural images without using an external classifier. We used a feedforward convolutional SNN and a temporal coding scheme where the most strongly activated neurons fire first, while less activated ones fire later, or not at all. In the highest layers, each neuron was assigned to an object category, and it was assumed that the stimulus category was the category of the first neuron to fire. If this assumption was correct, the neuron was rewarded, i.e., spike-timing-dependent plasticity (STDP) was applied, which reinforced the neuron's selectivity. Otherwise, anti-STDP was applied, which encouraged the neuron to learn something else. As demonstrated on various image data sets (Caltech, ETH-80, and NORB), this reward-modulated STDP (R-STDP) approach has extracted particularly discriminative visual features, whereas classic unsupervised STDP extracts any feature that consistently repeats. As a result, R-STDP has outperformed STDP on these data sets. Furthermore, R-STDP is suitable for online learning and can adapt to drastic changes such as label permutations. Finally, it is worth mentioning that both feature extraction and classification were done with spikes, using at most one spike per neuron. Thus, the network is hardware friendly and energy efficient.
We propose a new supervised learning rule for multilayer spiking neural networks (SNN) that use a form of temporal coding known as rank-ordercoding. With this coding scheme, all neurons fire exactly one spike per stimulus, but the firing order carries information. In particular, in the readout layer the first neuron to fire determines the class of the stimulus. We derive a new learning rule for this sort of network, termed S4NN, akin to traditional error backpropagation, yet based on latencies. We show how approximate error gradients can be computed backward in a feedforward network with an arbitrary number of layers. This approach reaches state-of-the-art performance with SNNs: test accuracy of 97.4% for the MNIST dataset, and of 99.2% for the Caltech Face/Motorbike dataset. Yet the neuron model we use, non-leaky integrate-and-fire, are simpler and more hardware friendly than the one used in all previous similar proposals.
Deep convolutional neural networks (DCNNs) have attracted much attention recently, and have shown to be able to recognize thousands of object categories in natural image databases. Their architecture is somewhat similar to that of the human visual system: both use restricted receptive fields, and a hierarchy of layers which progressively extract more and more abstracted features. Yet it is unknown whether DCNNs match human performance at the task of view-invariant object recognition, whether they make similar errors and use similar representations for this task, and whether the answers depend on the magnitude of the viewpoint variations. To investigate these issues, we benchmarked eight state-of-the-art DCNNs, the HMAX model, and a baseline shallow model and compared their results to those of humans with backward masking. Unlike in all previous DCNN studies, we carefully controlled the magnitude of the viewpoint variations to demonstrate that shallow nets can outperform deep nets and humans when variations are weak. When facing larger variations, however, more layers were needed to match human performance and error distributions, and to have representations that are consistent with human behavior. A very deep net with 18 layers even outperformed humans at the highest variation level, using the most human-like representations.
-inspired unsupervised learning of visual features leads to robust invariant object recognition. " Neurocomputing 205 (2016): 382-392 AbstractRetinal image of surrounding objects varies tremendously due to the changes in position, size, pose, illumination condition, background context, occlusion, noise, and nonrigid deformations. But despite these huge variations, our visual system is able to invariantly recognize any object in just a fraction of a second. To date, various computational models have been proposed to mimic the hierarchical processing of the ventral visual pathway, with limited success. Here, we show that the association of both biologically inspired network architecture and learning rule significantly improves the models' performance when facing challenging invariant object recognition problems. Our model is an asynchronous feedforward spiking neural network. When the network is presented with natural images, the neurons in the entry layers detect edges, and the most activated ones fire first, while neurons in higher layers are equipped with spike timingdependent plasticity. These neurons progressively become selective to intermediate complexity visual features appropriate for object categorization. The model is evaluated on 3D-Object and ETH- * Corresponding author. Email addresses: kheradpisheh@ut.ac.ir (SRK), mgtabesh@ut.ac.ir (MG), timothee.masquelier@alum.mit.edu (TM).80 datasets which are two benchmarks for invariant object recognition, and is shown to outperform state-of-the-art models, including DeepConvNet and HMAX. This demonstrates its ability to accurately recognize different instances of multiple object classes even under various appearance conditions (different views, scales, tilts, and backgrounds). Several statistical analysis techniques are used to show that our model extracts class specific and highly informative features.
Behavioral studies in humans indicate that peripheral vision can do object recognition to some extent. Moreover, recent studies have shown that some information from brain regions retinotopic to visual periphery is somehow fed back to regions retinotopic to the fovea and disrupting this feedback impairs object recognition in human. However, it is unclear to what extent the information in visual periphery contributes to human object categorization. Here, we designed two series of rapid object categorization tasks to first investigate the performance of human peripheral vision in categorizing natural object images at different eccentricities and abstraction levels (superordinate, basic, and subordinate). Then, using a delayed foveal noise mask, we studied how modulating the foveal representation impacts peripheral object categorization at any of the abstraction levels. We found that peripheral vision can quickly and accurately accomplish superordinate categorization, while its performance in finer categorization levels dramatically drops as the object presents further in the periphery. Also, we found that a 300-ms delayed foveal noise mask can significantly disturb categorization performance in basic and subordinate levels, while it has no effect on the superordinate level. Our results suggest that human peripheral vision can easily process objects at high abstraction levels, and the information is fed back to foveal vision to prime foveal cortex for finer categorizations when a saccade is made toward the target object.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.