A Practical Sparse Approximation for Real Time Recurrent Learning

Menick, Jacob; Elsen, Erich; Evci, Utku; Osindero, Simon; Simonyan, Karen; Graves, Alex

doi:10.48550/arxiv.2006.07232

Cited by 5 publications

(6 citation statements)

References 8 publications

(10 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These existing bio-plausible rules for RNNs update rules estimate the gradient using known biological learning ingredients: eligibility traces, which maintain preceding activity on the molecular levels [76][77][78][79][80][81], combined with top-down instructive signaling [76,77,[82][83][84][85][86][87][88] as well as local cell-to-cell modulatory signaling within the network [13,89,90]. For efficient online learning in RNNs, other approximations (not necessarily bio-plausible) to RTRL [91][92][93][94][95][96] have also demonstrated to produce good performance. Given the impressive accuracy achieved by these approximate rules, several studies began to investigate their convergence properties [97], e.g.…”

Section: Bio-plausible Gradient Approximationsmentioning

confidence: 99%

Beyond accuracy: generalization properties of bio-plausible temporal credit assignment rules

Liu¹,

Ghosh²,

Richards³

et al. 2022

Preprint

View full text Add to dashboard Cite

To unveil how the brain learns, ongoing work seeks biologically-plausible approximations of gradient descent algorithms for training recurrent neural networks (RNNs). Yet, beyond task accuracy, it is unclear if such learning rules converge to solutions that exhibit different levels of generalization than their non-biologically-plausible counterparts. Leveraging results from deep learning theory based on loss landscape curvature, we ask: how do biologically-plausible gradient approximations affect generalization? We first demonstrate that state-of-the-art biologically-plausible learning rules for training RNNs exhibit worse and more variable generalization performance compared to their machine learning counterparts that follow the true gradient more closely. Next, we verify that such generalization performance is correlated significantly with loss landscape curvature, and we show that biologically-plausible learning rules tend to approach high-curvature regions in synaptic weight space. Using tools from dynamical systems, we derive theoretical arguments and present a theorem explaining this phenomenon. This predicts our numerical results, and explains why biologically-plausible rules lead to worse and more variable generalization properties. Finally, we suggest potential remedies that could be used by the brain to mitigate this effect. To our knowledge, our analysis is the first to identify the reason for this generalization gap between artificial and biologically-plausible learning rules, which can help guide future investigations into how the brain learns solutions that generalize.

show abstract

Section: Bio-plausible Gradient Approximationsmentioning

confidence: 99%

Beyond accuracy: generalization properties of bio-plausible temporal credit assignment rules

Liu¹,

Ghosh²,

Richards³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Recent work has shown that making Recurrent Neural Networks sparser can be advantageous not only for expediting the training process but also for improving performance [25]. Distinctively, in [26], [27], the authors show that keeping a constant number of non-zero parameters while increasing the size and sparsity of a network leads to increased accuracy. They do so by creating larger networks at the beginning of the training process and populating a binary mask throughout the training process, always setting the smallest weights to zero.…”

Section: Sparsitymentioning

confidence: 99%

Real-time classification of LIDAR data using discrete-time Recurrent Spiking Neural Networks

Vicol

Yin

Bohté

2022

2022 International Joint Conference on Neural Networks (IJCNN)

View full text Add to dashboard Cite

With the advancement of Edge AI and autonomous systems, AI applications are increasingly subject to energy, latency and environmental constraints. Biological neural systems naturally adhere to these constraints and, as such are a source of inspiration. Spiking Neural Networks (SNNs) are a more detailed model of biological neural processing. Recent work shows that they perform well in object recognition and detection in general, and in Autonomous Driving tasks based on ranged LIDAR data in particular. However, these LIDAR-SNN approaches do not optimize for latency, as they require the entire frame to be scanned before processing. They also require large SNNs, limiting the energy efficiency achieved. To reach both low-latency and high energy efficiency in LIDAR object recognition, we develop a compact recurrent SNN. First, we propose and examine an open LIDAR labeled dataset by processing the point clouds from the KITTI Vision Benchmark. We then train our recurrent SNNs on this dataset and propose specific optimizations, including input encoding, sparse connectivity and truncation of error-backpropagation. With these optimizations, we show that compact recurrent SNNs can exceed the performance of classical RNNs like LSTMs and approach the performance of large nonspiking CNNs. Additionally, they significantly reduce latency by allowing early and online object classification before the end of the sequence.

show abstract

“…The common assumption is that the computation of activations and gradients and their propagation are instantaneous [6,42] -this is not physically possible. Real Time Recurrent Learning (RTRL) [43,44,68] and Sideways [42] attempt to mitigate this issue. RTRL computes correct gra-dients in the forward mode.…”

Section: Related Workmentioning

confidence: 99%

Gradient Forward-Propagation for Large-Scale Temporal Video Modelling

Malinowski

Vytiniotis

Świrszcz

et al. 2021

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

How can neural networks be trained on large-volume temporal data efficiently? To compute the gradients required to update parameters, backpropagation blocks computations until the forward and backward passes are completed. For temporal signals, this introduces high latency and hinders real-time learning. It also creates a coupling between consecutive layers, which limits model parallelism and increases memory consumption. In this paper, we build upon Sideways, which avoids blocking by propagating approximate gradients forward in time, and we propose mechanisms for temporal integration of information based on different variants of skip connections. We also show how to decouple computation and delegate individual neural modules to different devices, allowing distributed and parallel training. The proposed Skip-Sideways achieves low latency training, model parallelism, and, importantly, is capable of extracting temporal features, leading to more stable training and improved performance on real-world action recognition video datasets such as HMDB51, UCF101, and the large-scale Kinetics-600. Finally, we also show that models trained with Skip-Sideways generate better future frames than Sideways models, and hence they can better utilize motion cues.

show abstract

A Practical Sparse Approximation for Real Time Recurrent Learning

Cited by 5 publications

References 8 publications

Beyond accuracy: generalization properties of bio-plausible temporal credit assignment rules

Beyond accuracy: generalization properties of bio-plausible temporal credit assignment rules

Real-time classification of LIDAR data using discrete-time Recurrent Spiking Neural Networks

Gradient Forward-Propagation for Large-Scale Temporal Video Modelling

Contact Info

Product

Resources

About