Environmental sounds form part of our daily life. With the advancement of deep learning models and the abundance of training data, the performance of automatic sound classification (ASC) systems has improved significantly in recent years. However, the high computational cost, hence high power consumption, remains a major hurdle for large-scale implementation of ASC systems on mobile and wearable devices. Motivated by the observations that humans are highly effective and consume little power whilst analyzing complex audio scenes, we propose a biologically plausible ASC framework, namely SOM-SNN. This framework uses the unsupervised self-organizing map (SOM) for representing frequency contents embedded within the acoustic signals, followed by an event-based spiking neural network (SNN) for spatiotemporal spiking pattern classification. We report experimental results on the RWCP environmental sound and TIDIGITS spoken digits datasets, which demonstrate competitive classification accuracies over other deep learning and SNN-based models. The SOM-SNN framework is also shown to be highly robust to corrupting noise after multi-condition training, whereby the model is trained with noise-corrupted sound samples. Moreover, we discover the early decision making capability of the proposed framework: an accurate classification can be made with an only partial presentation of the input.
Spiking Neural Networks (SNNs) use spatiotemporal spike patterns to represent and transmit information, which are not only biologically realistic but also suitable for ultralow-power event-driven neuromorphic implementation. Just like other deep learning techniques, Deep Spiking Neural Networks (DeepSNNs) benefit from the deep architecture. However, the training of DeepSNNs is not straightforward because the wellstudied error back-propagation (BP) algorithm is not directly applicable. In this paper, we first establish an understanding as to why error back-propagation does not work well in DeepSNNs. We then propose a simple yet efficient Rectified Linear Postsynaptic Potential function (ReL-PSP) for spiking neurons and a Spike-Timing-Dependent Back-Propagation (STDBP) learning algorithm for DeepSNNs where the timing of individual spikes is used to convey information (temporal coding), and learning (back-propagation) is performed based on spike timing in an event-driven manner. We show that DeepSNNs trained with the proposed single spike time-based learning algorithm can achieve state-of-the-art classification accuracy. Furthermore, by utilizing the trained model parameters obtained from the proposed STDBP learning algorithm, we demonstrate ultra-lowpower inference operations on a recently proposed neuromorphic inference accelerator. The experimental results also show that the neuromorphic hardware consumes 0.751 mW of the total power consumption and achieves a low latency of 47.71 ms to classify an image from the MNIST dataset. Overall, this work investigates the contribution of spike timing dynamics for information encoding, synaptic plasticity and decision making, providing a new perspective to the design of future DeepSNNs and neuromorphic hardware.
Artificial neural networks (ANN) have become the mainstream acoustic modeling technique for large vocabulary automatic speech recognition (ASR). A conventional ANN features a multi-layer architecture that requires massive amounts of computation. The brain-inspired spiking neural networks (SNN) closely mimic the biological neural networks and can operate on low-power neuromorphic hardware with spike-based computation. Motivated by their unprecedented energyefficiency and rapid information processing capability, we explore the use of SNNs for speech recognition. In this work, we use SNNs for acoustic modeling and evaluate their performance on several large vocabulary recognition scenarios. The experimental results demonstrate competitive ASR accuracies to their ANN counterparts, while require significantly reduced computational cost and inference time. Integrating the algorithmic power of deep SNNs with energy-efficient neuromorphic hardware, therefore, offer an attractive solution for ASR applications running locally on mobile and embedded devices.
One of the long-standing questions in biology and machine learning is how neural networks may learn important features from the input activities with a delayed feedback, commonly known as the temporal credit-assignment problem. The aggregate-label learning is proposed to resolve this problem by matching the spike count of a neuron with the magnitude of a feedback signal. However, the existing threshold-driven aggregate-label learning algorithms are computationally intensive, resulting in relatively low learning efficiency hence limiting their usability in practical applications. In order to address these limitations, we propose a novel membrane-potential driven aggregate-label learning algorithm, namely MPD-AL. With this algorithm, the easiest modifiable time instant is identified from membrane potential traces of the neuron, and guild the synaptic adaptation based on the presynaptic neurons’ contribution at this time instant. The experimental results demonstrate that the proposed algorithm enables the neurons to generate the desired number of spikes, and to detect useful clues embedded within unrelated spiking activities and background noise with a better learning efficiency over the state-of-the-art TDP1 and Multi-Spike Tempotron algorithms. Furthermore, we propose a data-driven dynamic decoding scheme for practical classification tasks, of which the aggregate labels are hard to define. This scheme effectively improves the classification accuracy of the aggregate-label learning algorithms as demonstrated on a speech recognition task.
Deep spiking neural networks (SNNs) support asynchronous event-driven computation, massive parallelism and demonstrate great potential to improve the energy efficiency of its synchronous analog counterpart. However, insufficient attention has been paid to neural encoding when designing SNN learning rules. Remarkably, the temporal credit assignment has been performed on rate-coded spiking inputs, leading to poor learning efficiency. In this paper, we introduce a novel spike-based learning rule for rate-coded deep SNNs, whereby the spike count of each neuron is used as a surrogate for gradient backpropagation. We evaluate the proposed learning rule by training deep spiking multi-layer perceptron (MLP) and spiking convolutional neural network (CNN) on the UCI machine learning and MNIST handwritten digit datasets. We show that the proposed learning rule achieves state-of-the-art accuracies on all benchmark datasets. The proposed learning rule allows introducing latency, spike rate and hardware constraints into the SNN learning, which is superior to the indirect approach in which conventional artificial neural networks are first trained and then converted to SNNs. Hence, it allows direct deployment to the neuromorphic hardware and supports efficient inference. Notably, a test accuracy of 98.40% was achieved on the MNIST dataset in our experiments with only 10 simulation time steps, when the same latency constraint is imposed during training.
Spiking neurons are becoming increasingly popular owing to their biological plausibility and promising computational properties. Unlike traditional rate-based neural models, spiking neurons encode information in the temporal patterns of the transmitted spike trains, which makes them more suitable for processing spatiotemporal information. One of the fundamental computations of spiking neurons is to transform streams of input spike trains into precisely timed firing activity. However, the existing learning methods, used to realize such computation, often result in relatively low accuracy performance and poor robustness to noise. In order to address these limitations, we propose a novel highly effective and robust membrane potential-driven supervised learning (MemPo-Learn) method, which enables the trained neurons to generate desired spike trains with higher precision, higher efficiency, and better noise robustness than the current state-of-the-art spiking neuron learning methods. While the traditional spike-driven learning methods use an error function based on the difference between the actual and desired output spike trains, the proposed MemPo-Learn method employs an error function based on the difference between the output neuron membrane potential and its firing threshold. The efficiency of the proposed learning method is further improved through the introduction of an adaptive strategy, called skip scan training strategy, that selectively identifies the time steps when to apply weight adjustment. The proposed strategy enables the MemPo-Learn method to effectively and efficiently learn the desired output spike train even when much smaller time steps are used. In addition, the learning rule of MemPo-Learn is improved further to help mitigate the impact of the input noise on the timing accuracy and reliability of the neuron firing dynamics. The proposed learning method is thoroughly evaluated on synthetic data and is further demonstrated on real-world classification tasks. Experimental results show that the proposed method can achieve high learning accuracy with a significant improvement in learning time and better robustness to different types of noise.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.