Two sparsities are better than one: unlocking the performance benefits of sparse–sparse networks

Hunter, Kevin Lee; Spracklen, Lawrence; Ahmad, Subutai

doi:10.1088/2634-4386/ac7c8a

Cited by 4 publications

(4 citation statements)

References 70 publications

(96 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…By contrast, using DN on RNNs is beneficial because the fully-connected RNNs are weight-memory bounded, and the energy saving brought by temporal sparsity is much larger for RNNs. [7] and [10] exploit both DN activation sparsity and weight sparsity to achieve impressive inference performance on hardware. Another method that can create sparsity in neural networks is conditional computation, or skipping operations.…”

Section: Related Workmentioning

confidence: 99%

Exploiting Symmetric Temporally Sparse BPTT for Efficient RNN Training

Chen,

Gao,

Wang

et al. 2024

AAAI

View full text Add to dashboard Cite

Recurrent Neural Networks (RNNs) are useful in temporal sequence tasks. However, training RNNs involves dense matrix multiplications which require hardware that can support a large number of arithmetic operations and memory accesses. Implementing online training of RNNs on the edge calls for optimized algorithms for an efficient deployment on hardware. Inspired by the spiking neuron model, the Delta RNN exploits temporal sparsity during inference by skipping over the update of hidden states from those inactivated neurons whose change of activation across two timesteps is below a defined threshold. This work describes a training algorithm for Delta RNNs that exploits temporal sparsity in the backward propagation phase to reduce computational requirements for training on the edge. Due to the symmetric computation graphs of forward and backward propagation during training, the gradient computation of inactivated neurons can be skipped. Results show a reduction of ∼80% in matrix operations for training a 56k parameter Delta LSTM on the Fluent Speech Commands dataset with negligible accuracy loss. Logic simulations of a hardware accelerator designed for the training algorithm show 2-10X speedup in matrix computations for an activation sparsity range of 50%-90%. Additionally, we show that the proposed Delta RNN training will be useful for online incremental learning on edge devices with limited computing resources.

show abstract

Section: Related Workmentioning

confidence: 99%

Exploiting Symmetric Temporally Sparse BPTT for Efficient RNN Training

Chen,

Gao,

Wang

et al. 2024

AAAI

View full text Add to dashboard Cite

show abstract

“…Activity sparsity in RNNs has been proposed previously in various forms [29,47,48], but only focusing on achieving it during inference. Conditional computation is a form of activity sparsity used in [17] to scale to 1 trillion parameters.…”

Section: Related Workmentioning

confidence: 99%

EGRU: Event-based GRU for activity-sparse inference and learning

Subramoney¹,

Nazeer²,

Schöne³

et al. 2022

Preprint

View full text Add to dashboard Cite

The scalability of recurrent neural networks (RNNs) is hindered by the sequential dependence of each time step's computation on the previous time step's output. Therefore, one way to speed up and scale RNNs is to reduce the computation required at each time step independent of model size and task. In this paper, we propose a model that reformulates Gated Recurrent Units (GRU) as an event-based activity-sparse model that we call the Event-based GRU (EGRU), where units compute updates only on receipt of input events (event-based) from other units. When combined with having only a small fraction of the units active at a time (activity-sparse), this model has the potential to be vastly more compute efficient than current RNNs. Notably, activity-sparsity in our model also translates into sparse parameter updates during gradient descent, extending this compute efficiency to the training phase. We show that the EGRU demonstrates competitive performance compared to state-of-the-art recurrent network models in real-world tasks, including language modeling while maintaining high activity sparsity naturally during inference and training. This sets the stage for the next generation of recurrent networks that are scalable and more suitable for novel neuromorphic hardware.Preprint. Under review.

show abstract

“…Further, Moraitis et al (2022) introduce an unsupervised local training algorithm based on a combination of Hebbian plasticity with a soft winner take all mechanism. On the hardware side, two studies introduce new methods to exploit sparsity on current hardware (graphics processing unit (GPU) and field programmable gate array (FPGA)) to improve inference efficiency through Complementary Sparsity (Turner et al 2022) and Procedural connectivity (Hunter et al 2022). Finally, on the application side, DeWolf et al (2023) introduce a welcome closed-loop benchmark control task based on a robotic arm simulated in the popular Mojoco platform to showcase the inherent power efficiency and low latency of event-based computation.…”

mentioning

confidence: 99%

“…Although the sparse activity and connectivity of SNNs should result in a proportional reduction in computing requirements, the irregular patterns of neuron interconnectivity and activity reduce the expected gains on current hardware. Hunter et al (2022) address this problem by structuring the sparsity to match the requirements of the target hardware for implementing sparse activation-sparse connectivity networks on FPGA. This restructuring is achieved by overlaying multiple sparse matrices to form a single dense structure if no two sparse matrices contain non-zero elements at the same location.…”

mentioning

confidence: 99%