2018 International Conference on ReConFigurable Computing and FPGAs (ReConFig) 2018
DOI: 10.1109/reconfig.2018.8641739
|View full text |Cite
|
Sign up to set email alerts
|

A Highly Parallel FPGA Implementation of Sparse Neural Network Training

Abstract: We demonstrate an FPGA implementation of a parallel and reconfigurable architecture for sparse neural networks, capable of on-chip training and inference. The network connectivity uses pre-determined, structured sparsity to significantly reduce complexity by lowering memory and computational requirements. The architecture uses a notion of edge-processing, leading to efficient pipelining and parallelization. Moreover, the device can be reconfigured to trade off resource utilization with training time to fit net… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 11 publications
(9 citation statements)
references
References 13 publications
0
9
0
Order By: Relevance
“…Pre-defined sparsity is a simple method to help address this challenge, as is acceleration with custom hardware. Interesting areas for future research include analytical approaches to justify the trends observed in this work and improving our initial hardware implementation in [40]. It is also interesting to consider extending the methods introduced herein to convolutional layers and recurrent architectures.…”
Section: Discussionmentioning
confidence: 96%
See 1 more Smart Citation
“…Pre-defined sparsity is a simple method to help address this challenge, as is acceleration with custom hardware. Interesting areas for future research include analytical approaches to justify the trends observed in this work and improving our initial hardware implementation in [40]. It is also interesting to consider extending the methods introduced herein to convolutional layers and recurrent architectures.…”
Section: Discussionmentioning
confidence: 96%
“…In results not presented here, we found no performance degradation due to this variation from the standard backpropagation algorithm. There is considerable ambiguity in the literature regarding ideal batch sizes [41], [42], and we found that our current network architecture performed well in our initial hardware implementation [40]. However, if a more conventional batch size is desired, the UP logic can be removed from the junction pipeline and the UP operation performed once every M inputs.…”
Section: Batch Sizementioning
confidence: 94%
“…For example, looking a Figure 15a, for a weight sparsity of 4 non-zeros out of 64-elements (i.e. 60 64 =93.75% sparse), as the activation sparsity is increased from 16 64 to 8 64 and 4 64 , the number of LUTs required for the implementation of the 1×1 convolution is reduced by 2.7X and 4.1X, respectively.…”
Section: Resource Utilization Of Sparse-sparse Convolution Kernelsmentioning
confidence: 99%
“…There have been a number of papers investigating sparse-dense network implementations on FPGAs. Employing either weight [31,19,16,85,34,38,10] or activation sparsity [2], they show it is possible to reduce the number of MAC operations by routing a subset of the dense values to the sparse set of operands at the processing units. This can be done either via multi-ported memories [16] or multiplexor networks [19].…”
Section: Accelerating Sparse Dnns On Fpgasmentioning
confidence: 99%
See 1 more Smart Citation