A Highly Parallel FPGA Implementation of Sparse Neural Network Training

Dey, Sourya; Chen, Diandian; Li, Zongyang; Kundu, Souvik; Huang, Kuan-Wen; Chugg, K.M.; Beerel, Peter A.

doi:10.1109/reconfig.2018.8641739

Cited by 11 publications

(9 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Pre-defined sparsity is a simple method to help address this challenge, as is acceleration with custom hardware. Interesting areas for future research include analytical approaches to justify the trends observed in this work and improving our initial hardware implementation in [40]. It is also interesting to consider extending the methods introduced herein to convolutional layers and recurrent architectures.…”

Section: Discussionmentioning

confidence: 96%

“…In results not presented here, we found no performance degradation due to this variation from the standard backpropagation algorithm. There is considerable ambiguity in the literature regarding ideal batch sizes [41], [42], and we found that our current network architecture performed well in our initial hardware implementation [40]. However, if a more conventional batch size is desired, the UP logic can be removed from the junction pipeline and the UP operation performed once every M inputs.…”

Section: Batch Sizementioning

confidence: 94%

See 1 more Smart Citation

Pre-Defined Sparse Neural Networks With Hardware Acceleration

Dey

Huang

Beerel

et al. 2019

IEEE J. Emerg. Sel. Topics Circuits Syst.

Self Cite

View full text Add to dashboard Cite

Neural networks have proven to be extremely powerful tools for modern artificial intelligence applications, but computational and storage complexity remain limiting factors. This paper presents two compatible contributions towards reducing the time, energy, computational, and storage complexities associated with multilayer perceptrons. Pre-defined sparsity is proposed to reduce the complexity during both training and inference, regardless of the implementation platform. Our results show that storage and computational complexity can be reduced by factors greater than 5X without significant performance loss. The second contribution is an architecture for hardware acceleration that is compatible with pre-defined sparsity. This architecture supports both training and inference modes and is flexible in the sense that it is not tied to a specific number of neurons. For example, this flexibility implies that various sized neural networks can be supported on various sized Field Programmable Gate Array (FPGA)s.

show abstract

Section: Discussionmentioning

confidence: 96%

Section: Batch Sizementioning

confidence: 94%

Pre-Defined Sparse Neural Networks With Hardware Acceleration

Dey

Huang

Beerel

et al. 2019

IEEE J. Emerg. Sel. Topics Circuits Syst.

Self Cite

View full text Add to dashboard Cite

show abstract

“…For example, looking a Figure 15a, for a weight sparsity of 4 non-zeros out of 64-elements (i.e. 60 64 =93.75% sparse), as the activation sparsity is increased from 16 64 to 8 64 and 4 64 , the number of LUTs required for the implementation of the 1×1 convolution is reduced by 2.7X and 4.1X, respectively.…”

Section: Resource Utilization Of Sparse-sparse Convolution Kernelsmentioning

confidence: 99%

“…There have been a number of papers investigating sparse-dense network implementations on FPGAs. Employing either weight [31,19,16,85,34,38,10] or activation sparsity [2], they show it is possible to reduce the number of MAC operations by routing a subset of the dense values to the sparse set of operands at the processing units. This can be done either via multi-ported memories [16] or multiplexor networks [19].…”

Section: Accelerating Sparse Dnns On Fpgasmentioning

confidence: 99%

“…Employing either weight [31,19,16,85,34,38,10] or activation sparsity [2], they show it is possible to reduce the number of MAC operations by routing a subset of the dense values to the sparse set of operands at the processing units. This can be done either via multi-ported memories [16] or multiplexor networks [19]. Although reducing the number of multiplies results in power savings, these techniques typically perform only one dot product at a time in each processing unit.…”

Section: Accelerating Sparse Dnns On Fpgasmentioning

confidence: 99%

See 1 more Smart Citation

Two Sparsities Are Better Than One: Unlocking the Performance Benefits of Sparse-Sparse Networks

Hunter¹,

Spracklen²,

Ahmad³

2021

Preprint

View full text Add to dashboard Cite

In principle, sparse neural networks should be significantly more efficient than traditional dense networks. Neurons in the brain exhibit two types of sparsity; they are sparsely interconnected and sparsely active. These two types of sparsity, called weight sparsity and activation sparsity, when combined, offer the potential to reduce the computational cost of neural networks by two orders of magnitude. Despite this potential, today's neural networks deliver only modest performance benefits using just weight sparsity, because traditional computing hardware cannot efficiently process sparse networks. In this article we introduce Complementary Sparsity, a novel technique that significantly improves the performance of dual sparse networks on existing hardware. We demonstrate that we can achieve high performance running weight-sparse networks, and we can multiply those speedups by incorporating activation sparsity. Using Complementary Sparsity, we show up to 100X improvement in throughput and energy efficiency performing inference on FPGAs. We analyze scalability and resource tradeoffs for a variety of kernels typical of commercial convolutional networks such as ResNet-50 and MobileNetV2. Our results with Complementary Sparsity suggest that weight plus activation sparsity can be a potent combination for efficiently scaling future AI models.

show abstract