Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays 2019
DOI: 10.1145/3289602.3293939
|View full text |Cite
|
Sign up to set email alerts
|

Sparse Winograd Convolutional Neural Networks on Small-scale Systolic Arrays

Abstract: The reconfigurability, energy-efficiency, and massive parallelism on FPGAs make them one of the best choices for implementing efficient deep learning accelerators. However, state-of-art implementations seldom consider the balance between high throughput of computation power and the ability of the memory subsystem to support it. In this paper, we implement an accelerator on FPGA by combining the sparse Winograd convolution, clusters of small-scale systolic arrays, and a tailored memory layout design. We also pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(7 citation statements)
references
References 13 publications
(21 reference statements)
0
6
0
Order By: Relevance
“…On the basis of inter-layer fusion, Xiao etal. An accelerator structure based on the Winograd algorithm and traditional convolution computing components is deployed on the chip [12]. Fowers et al describes Microsoft's hardware deployment of neural network applications based on FPGA in the data center.…”
Section: Neural Network Acceleratormentioning
confidence: 99%
“…On the basis of inter-layer fusion, Xiao etal. An accelerator structure based on the Winograd algorithm and traditional convolution computing components is deployed on the chip [12]. Fowers et al describes Microsoft's hardware deployment of neural network applications based on FPGA in the data center.…”
Section: Neural Network Acceleratormentioning
confidence: 99%
“…[33] applied pooling after the input transformation, the principle is the same as the application of ReLU. [34], [35] designed a new memory data layout for sparse Winograd convolution. [36] proposed to learn the pruning coefficient of Winograd convolution locally and reached a sparse rate of more than 90%.…”
Section: Pruningmentioning
confidence: 99%
“…[72] implemented hybrid convolution on FPGA and analysed the occasions suitable for FFT and Winograd convolution. [35], [73], [74], [75] unified the realization of the Winograd convolution kernel matrix multiplication and maximize the reusability of the module. [76], [77] conducted a comprehensive design space exploration on the realization of Winograd convolution on FPGA.…”
Section: Cpumentioning
confidence: 99%
“…Systolic array is another common solution due to its regularity and simplicity. [4]- [11] adopted the systolic designs to relax the complexity of data paths. An 8×3 convolutional systolic array with double buffering strategy was proposed in [4] to improve the throughput and power efficiency.…”
Section: Introductionmentioning
confidence: 99%
“…Particularly, a unified architecture based on systolic array was explored by W. Liu et al [10], which can be applied in traditional convolution, transpose convolution, and dilated convolution with the zeroskipping operations. F. Shi et al [11] exploited Winograd algorithm to CNN acceleration on a small-scale systolic array, which can reduce the number of multiplications through spatial convolution. Also, a precision-scalable CNN processor is implemented in [12] to minimize the energy consumption while maintaining the throughput.…”
Section: Introductionmentioning
confidence: 99%