2020
DOI: 10.48550/arxiv.2006.07232
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Practical Sparse Approximation for Real Time Recurrent Learning

Abstract: Current methods for training recurrent neural networks are based on backpropagation through time, which requires storing a complete history of network states, and prohibits updating the weights 'online' (after every timestep). Real Time Recurrent Learning (RTRL) eliminates the need for history storage and allows for online weight updates, but does so at the expense of computational costs that are quartic in the state size. This renders RTRL training intractable for all but the smallest networks, even ones that… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 8 publications
(10 reference statements)
0
6
0
Order By: Relevance
“…These existing bio-plausible rules for RNNs update rules estimate the gradient using known biological learning ingredients: eligibility traces, which maintain preceding activity on the molecular levels [76][77][78][79][80][81], combined with top-down instructive signaling [76,77,[82][83][84][85][86][87][88] as well as local cell-to-cell modulatory signaling within the network [13,89,90]. For efficient online learning in RNNs, other approximations (not necessarily bio-plausible) to RTRL [91][92][93][94][95][96] have also demonstrated to produce good performance. Given the impressive accuracy achieved by these approximate rules, several studies began to investigate their convergence properties [97], e.g.…”
Section: Bio-plausible Gradient Approximationsmentioning
confidence: 99%
“…These existing bio-plausible rules for RNNs update rules estimate the gradient using known biological learning ingredients: eligibility traces, which maintain preceding activity on the molecular levels [76][77][78][79][80][81], combined with top-down instructive signaling [76,77,[82][83][84][85][86][87][88] as well as local cell-to-cell modulatory signaling within the network [13,89,90]. For efficient online learning in RNNs, other approximations (not necessarily bio-plausible) to RTRL [91][92][93][94][95][96] have also demonstrated to produce good performance. Given the impressive accuracy achieved by these approximate rules, several studies began to investigate their convergence properties [97], e.g.…”
Section: Bio-plausible Gradient Approximationsmentioning
confidence: 99%
“…Recent work has shown that making Recurrent Neural Networks sparser can be advantageous not only for expediting the training process but also for improving performance [25]. Distinctively, in [26], [27], the authors show that keeping a constant number of non-zero parameters while increasing the size and sparsity of a network leads to increased accuracy. They do so by creating larger networks at the beginning of the training process and populating a binary mask throughout the training process, always setting the smallest weights to zero.…”
Section: Sparsitymentioning
confidence: 99%
“…The common assumption is that the computation of activations and gradients and their propagation are instantaneous [6,42] -this is not physically possible. Real Time Recurrent Learning (RTRL) [43,44,68] and Sideways [42] attempt to mitigate this issue. RTRL computes correct gra-dients in the forward mode.…”
Section: Related Workmentioning
confidence: 99%