Sparse Winograd Convolutional Neural Networks on Small-scale Systolic Arrays

Shi, Feng; Li, Haochen; Gao, Yuhe; Kuschner, Benjamin; Zhu, Song‐Chun

doi:10.1145/3289602.3293939

Cited by 13 publications

(7 citation statements)

References 13 publications

(21 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On the basis of inter-layer fusion, Xiao etal. An accelerator structure based on the Winograd algorithm and traditional convolution computing components is deployed on the chip [12]. Fowers et al describes Microsoft's hardware deployment of neural network applications based on FPGA in the data center.…”

Section: Neural Network Acceleratormentioning

confidence: 99%

Design and exploration of neural network microsystem based on SiP

Zhang

Deng

et al. 2021

SN Appl. Sci.

View full text Add to dashboard Cite

In recent years, microelectronics technology has entered the era of nanoelectronics/integrated microsystems. System in package (SiP) and system on chip (SoC) are two important technical approaches for the realization of microsystems. Deep learning technology based on neural networks is used in graphics and images. Computer vision and target recognition are widely used. The deep learning technology of convolutional neural network is an important research field in the miniaturization and miniaturization of embedded platforms. How to combine the lightweight neural network with the microsystem to achieve the optimal balance of performance, size, and power consumption is a difficult point. This article introduces a micro-system implementation scheme that combines SiP technology and FPGA-based convolutional neural network. It uses Zynq SoC and FLASH and DDR3 memory as the main components, and uses SiP high-density system packaging technology to integrate. PL end (FPGA) design Convolutional Neural Network, convolutional neural network accelerator, adopt the method of convolution multi-dimensional division and cyclic block to design the accelerator structure, design multiple multiplication and addition parallel computing units to provide the computing power of the system. Improving and accelerating perform on the YOLOv2_Tiny model. The test uses the COCO data set as the training and test samples. The microsystem can accurately identify the target. The volume is only 30 × 30 × 1.2 mm. The performance reaches 22.09GOPs and the power consumption is only 0.81 W under the working frequency of 150 MHz. Multi-objective balance (performance, size and power consumption) of lightweight neural network Microsystems has realized.

show abstract

Section: Neural Network Acceleratormentioning

confidence: 99%

Design and exploration of neural network microsystem based on SiP

Zhang

Deng

et al. 2021

SN Appl. Sci.

View full text Add to dashboard Cite

show abstract

“…[33] applied pooling after the input transformation, the principle is the same as the application of ReLU. [34], [35] designed a new memory data layout for sparse Winograd convolution. [36] proposed to learn the pruning coefficient of Winograd convolution locally and reached a sparse rate of more than 90%.…”

Section: Pruningmentioning

confidence: 99%

“…[72] implemented hybrid convolution on FPGA and analysed the occasions suitable for FFT and Winograd convolution. [35], [73], [74], [75] unified the realization of the Winograd convolution kernel matrix multiplication and maximize the reusability of the module. [76], [77] conducted a comprehensive design space exploration on the realization of Winograd convolution on FPGA.…”

Section: Cpumentioning

confidence: 99%

Fast Convolution based on Winograd Minimum Filtering: Introduction and Development

Gan¹,

Huang²

2021

Computer Science and Information Technology Trends

View full text Add to dashboard Cite

Convolutional Neural Network (CNN) has been widely used in various fields and played an important role. Convolution operators are the fundamental component of convolutional neural networks, and it is also the most time-consuming part of network training and inference. In recent years, researchers have proposed several fast convolution algorithms including FFT and Winograd. Among them, Winograd convolution significantly reduces the multiplication operations in convolution, and it also takes up less memory space than FFT convolution. Therefore, Winograd convolution has quickly become the first choice for fast convolution implementation within a few years. At present, there is no systematic summary of the convolution algorithm. This article aims to fill this gap and provide detailed references for follow-up researchers. This article summarizes the development of Winograd convolution from the three aspects of algorithm expansion, algorithm optimization, implementation, and application, and finally makes a simple outlook on the possible future directions.

show abstract

“…Systolic array is another common solution due to its regularity and simplicity. [4]- [11] adopted the systolic designs to relax the complexity of data paths. An 8×3 convolutional systolic array with double buffering strategy was proposed in [4] to improve the throughput and power efficiency.…”

Section: Introductionmentioning

confidence: 99%

“…Particularly, a unified architecture based on systolic array was explored by W. Liu et al [10], which can be applied in traditional convolution, transpose convolution, and dilated convolution with the zeroskipping operations. F. Shi et al [11] exploited Winograd algorithm to CNN acceleration on a small-scale systolic array, which can reduce the number of multiplications through spatial convolution. Also, a precision-scalable CNN processor is implemented in [12] to minimize the energy consumption while maintaining the throughput.…”

Section: Introductionmentioning

confidence: 99%

Accelerating Convolutional Neural Network Inference Based on a Reconfigurable Sliced Systolic Array

Zeng

Sun

Katto

et al. 2021

2021 IEEE International Symposium on Circuits and Systems (ISCAS)

View full text Add to dashboard Cite

Convolutional neural networks (CNNs) have achieved great successes on many computer vision tasks, such as image recognition, video processing, and target detection. In recent years, many hardware designs have been devoted to accelerating CNN inference. In order to further speed up CNN inference and reduce data waste, this work proposed a reconfigurable sliced systolic array: 1) Depending on the number of network nodes in each layer, the slice mode could be dynamically configured to achieve high throughput and resource utilization. 2) To take full advantage of convolution reuse and weight reuse, this work designed a tile-column sliding (TCS) processing dataflow. 3) A four-stage for loop algorithm was employed, which divides the CNN calculation into several parts based on the input nodes and output nodes. The entire CNN inference is carried out using integer-only arithmetic originated from TensorLite. Experimental results prove that these strategies lead to significant improvement in inference performance and energy efficiency. 1

show abstract

Sparse Winograd Convolutional Neural Networks on Small-scale Systolic Arrays

Cited by 13 publications

References 13 publications

Design and exploration of neural network microsystem based on SiP

Design and exploration of neural network microsystem based on SiP

Fast Convolution based on Winograd Minimum Filtering: Introduction and Development

Accelerating Convolutional Neural Network Inference Based on a Reconfigurable Sliced Systolic Array

Contact Info

Product

Resources

About