Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs

Zhao, Ritchie; Song, Weinan; Zhang, Wentao; Xing, Tianwei; Lin, Jeng-Hau; Srivastava, Mani; Gupta, Rajesh K.; Zhang, Zhiru

doi:10.1145/3020078.3021741

Cited by 329 publications

(151 citation statements)

References 17 publications

Supporting

Mentioning

143

Contrasting

Unclassified

Order By: Relevance

“…sign in Github to realize on the Xilinx PYNQ board which has the same FPGA used in other designs. From Table 7, compared with Zhao's implementation [34], the classification accuracy was almost the same, as for the performance per power efficiency (FPS/Watt), it is 5.18 times better, Compared with the FINN, the memory efficiency was 3.98 time better, and the performance per power efficiency is almost the same. Thus, our design achieves power efficiency CNN, since all the circuits are operated on chip primitives.…”

Section: Compared With An Edge Pruning Methodsmentioning

confidence: 90%

A Threshold Neuron Pruning for a Binarized Deep Neural Network on an FPGA

Fujii

Sato

Nakahara

2018

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARY For a pre-trained deep convolutional neural network (CNN) for an embedded system, a high-speed and a low power consumption are required. In the former of the CNN, it consists of convolutional layers, while in the latter, it consists of fully connection layers. In the convolutional layer, the multiply accumulation operation is a bottleneck, while the fully connection layer, the memory access is a bottleneck. The binarized CNN has been proposed to realize many multiply accumulation circuit on the FPGA, thus, the convolutional layer can be done with a high-seed operation. However, even if we apply the binarization to the fully connection layer, the amount of memory was still a bottleneck. In this paper, we propose a neuron pruning technique which eliminates almost part of the weight memory, and we apply it to the fully connection layer on the binarized CNN. In that case, since the weight memory is realized by an on-chip memory on the FPGA, it achieves a high-speed memory access. To further reduce the memory size, we apply the retraining the CNN after neuron pruning. In this paper, we propose a sequential-input parallel-output fully connection layer circuit for the binarized fully connection layer, while proposing a streaming circuit for the binarized 2D convolutional layer. The experimental results showed that, by the neuron pruning, as for the fully connected layer on the VGG-11 CNN, the number of neurons was reduced by 39.8% with keeping the 99% baseline accuracy. We implemented the neuron pruning CNN on the Xilinx Inc. Zynq Zedboard. Compared with the ARM Cortex-A57, it was 1773.0 times faster, it dissipated 3.1 times lower power, and its performance per power efficiency was 5781.3 times better. Also, compared with the Maxwell GPU, it was 11.1 times faster, it dissipated 7.7 times lower power, and its performance per power efficiency was 84.1 times better. Thus, the binarized CNN on the FPGA is suitable for the embedded system.

show abstract

Section: Compared With An Edge Pruning Methodsmentioning

confidence: 90%

A Threshold Neuron Pruning for a Binarized Deep Neural Network on an FPGA

Fujii

Sato

Nakahara

2018

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

show abstract

“…Zhao et al [22] propose BNN, an FPGA implementation of NN-128 with binary weights on board ZedBoard. They focus on accelerating the neural network in a very reduced FPGA, so the resulting throughput is very low.…”

Section: R E L At E D W O R Kmentioning

confidence: 99%

Scalable high-performance architecture for convolutional ternary neural networks on FPGA

Prost-Boucle

Bourge

Pétrot

et al. 2017

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

View full text Add to dashboard Cite

“…In Section 7, we validate the performance of our filter matrix packing algorithm with an FPGA implementation. Additionally, we compare our implementation to previous state-of-the-art FPGA results [57,43,16,70]. Figure 2 compares standard CNNs to two recent CNN variants, separable convolution [12,25] and shift convolution [65], as shown in Figure 2.…”

Section: Asic and Fpga Accelerators For Cnnsmentioning

confidence: 99%

Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations

Kung

McDanel

Zhang

2019

Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Syst

120

View full text Add to dashboard Cite

This paper describes a novel approach of packing sparse convolutional neural networks for their efficient systolic array implementations. By combining subsets of columns in the original filter matrix associated with a convolutional layer, we increase the utilization efficiency of the systolic array substantially (e.g., 4x) due to the increased density of nonzeros in the resulting packed filter matrix. In combining columns, for each row, all filter weights but one with the largest magnitude are pruned. We retrain the remaining weights to preserve high accuracy. We demonstrate that in mitigating data privacy concerns the retraining can be accomplished with only fractions of the original dataset (e.g., 10% for CIFAR-10). We study the effectiveness of this joint optimization for both high utilization and classification accuracy with ASIC and FPGA designs based on efficient bit-serial implementations of multiplier-accumulators. We present analysis and empirical evidence on the superior performance of our column combining approach against prior arts under metrics such as energy efficiency (3x) and inference latency (12x).

show abstract

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs

Cited by 329 publications

References 17 publications

A Threshold Neuron Pruning for a Binarized Deep Neural Network on an FPGA

A Threshold Neuron Pruning for a Binarized Deep Neural Network on an FPGA

Scalable high-performance architecture for convolutional ternary neural networks on FPGA

Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations

Contact Info

Product

Resources

About