A Hardware-Friendly Low-Bit Power-of-Two Quantization Method for CNNs and Its FPGA Implementation

Sui, Xuefu; Lv, Qunbo; Bai, Yang; Zhu, Bofeng; Zhi, Liangjie; Yang, Yuanbo; Tan, Zheng

doi:10.3390/s22176618

Cited by 7 publications

(19 citation statements)

References 55 publications

(78 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We set two registers between the PEs to temporarily store the feature map data, and each register can output and input one feature data per clock cycle, which enables data reuse during convolution calculation and turns the 1 × 3 convolution kernel into a sliding window for sliding calculation in the feature map. The other ReLU activation functions and truncation module operations remain consistent with the design of our previously reported method [ 37 ].…”

Section: Proposed Methodssupporting

confidence: 62%

“…In our previous study, we developed a hardware-friendly, high-accuracy, power-of-two quantization method, GSNQ [ 37 ], which can produce 3-bit or 4-bit high-accuracy quantization CNN models. Additionally, the power-of-two quantization achieves 0 on-chip DSP resource occupation in FPGAs, which effectively improves the CNN computational efficiency.…”

Section: Proposed Methodsmentioning

confidence: 99%

“…A highly pipelined shift-operation-based convolutional computation module designed in our previous study [ 37 ] for the power-of-two quantization model is shown in Figure 7 .…”

Section: Proposed Methodsmentioning

confidence: 99%

“…We combined the KRP pruning method with the GSNQ quantization method proposed in our previous study [ 37 ] to develop a hardware-friendly and high-precision CNN compression framework, as shown in Figure 6 . All parameters in the CNN pruning model obtained by KRP were still 32-bit floating-point numbers, which hinders the deployment of FPGAs.…”

Section: Methodsmentioning

confidence: 99%

“…The whole process is similar to the retraining method proposed in the lottery hypothesis [ 35 , 36 ], where the unpruned weights are reinitialized to their original weights during the training process. Eventually, we performed a 4-bit quantization of the KRP pruned model using the GSNQ quantization proposed in our previous study [ 37 ] and designed a highly pipelined high-performance convolutional computation module on the FPGA platform for the obtained lightweight network to verify the hardware-friendliness of the KRP method. This module directly skips all the zero calculations without excessive indexing, significantly saves hardware resources, and improves computational efficiency.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

A Hardware-Friendly High-Precision CNN Pruning Method and Its FPGA Implementation

Sui

Zhi

et al. 2023

Sensors

Self Cite

View full text Add to dashboard Cite

To address the problems of large storage requirements, computational pressure, untimely data supply of off-chip memory, and low computational efficiency during hardware deployment due to the large number of convolutional neural network (CNN) parameters, we developed an innovative hardware-friendly CNN pruning method called KRP, which prunes the convolutional kernel on a row scale. A new retraining method based on LR tracking was used to obtain a CNN model with both a high pruning rate and accuracy. Furthermore, we designed a high-performance convolutional computation module on the FPGA platform to help deploy KRP pruning models. The results of comparative experiments on CNNs such as VGG and ResNet showed that KRP has higher accuracy than most pruning methods. At the same time, the KRP method, together with the GSNQ quantization method developed in our previous study, forms a high-precision hardware-friendly network compression framework that can achieve “lossless” CNN compression with a 27× reduction in network model storage. The results of the comparative experiments on the FPGA showed that the KRP pruning method not only requires much less storage space, but also helps to reduce the on-chip hardware resource consumption by more than half and effectively improves the parallelism of the model in FPGAs with a strong hardware-friendly feature. This study provides more ideas for the application of CNNs in the field of edge computing.

show abstract

Section: Proposed Methodssupporting

confidence: 62%

Section: Proposed Methodsmentioning

confidence: 99%

“…A highly pipelined shift-operation-based convolutional computation module designed in our previous study [ 37 ] for the power-of-two quantization model is shown in Figure 7 .…”

Section: Proposed Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations