SYQ: Learning Symmetric Quantization for Efficient Deep Neural Networks

Faraone, Julian; Fraser, Nicholas J.; Blott, Michaela; Leong, Philip H. W.

doi:10.1109/cvpr.2018.00452

Cited by 110 publications

(93 citation statements)

References 16 publications

(23 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Our approach could also work to improve the results for models quantized with such custom floating point formats. Other approaches use codebooks [7], which put stringent restrictions on the hardware for an efficient implementation. We do not consider codebooks in our approach.…”

Section: Background and Related Workmentioning

confidence: 99%

Data-Free Quantization Through Weight Equalization and Bias Correction

Nagel¹,

Baalen²,

Blankevoort³

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

360

331

View full text Add to dashboard Cite

We introduce a data-free quantization method for deep neural networks that does not require fine-tuning or hyperparameter selection. It achieves near-original model performance on common computer vision architectures and tasks. 8-bit fixed-point quantization is essential for efficient inference in modern deep learning hardware architectures. However, quantizing models to run in 8-bit is a non-trivial task, frequently leading to either significant performance reduction or engineering time spent on training a network to be amenable to quantization. Our approach relies on equalizing the weight ranges in the network by making use of a scale-equivariance property of activation functions. In addition the method corrects biases in the error that are introduced during quantization. This improves quantization accuracy performance, and can be applied ubiquitously to almost any model with a straight-forward API call. For common architectures, such as the MobileNet family, we achieve state-of-the-art quantized model performance. We further show that the method also extends to other computer vision architectures and tasks such as semantic segmentation and object detection.

show abstract

Section: Background and Related Workmentioning

confidence: 99%

Data-Free Quantization Through Weight Equalization and Bias Correction

Nagel¹,

Baalen²,

Blankevoort³

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

360

331

View full text Add to dashboard Cite

show abstract

“…The work of Faraone et.al. groups parameters in training process and gradually quantizes each group with optimized scaling factor to minimize the quantization error [77].…”

Section: Minimize the Quantization Errormentioning

confidence: 99%

Binary neural networks: A survey

Qin

Gong

Liu

et al. 2020

Pattern Recognition

382

171

View full text Add to dashboard Cite

The binary neural network, largely saving the storage and computation, serves as a promising technique for deploying deep models on resource-limited devices.However, the binarization inevitably causes severe information loss, and even worse, its discontinuity brings difficulty to the optimization of the deep network.To address these issues, a variety of algorithms have been proposed, and achieved satisfying progress in recent years. In this paper, we present a comprehensive survey of these algorithms, mainly categorized into the native solutions directly conducting binarization, and the optimized ones using techniques like minimizing the quantization error, improving the network loss function, and reducing the gradient error. We also investigate other practical aspects of binary neural networks such as the hardware-friendly design and the training tricks. Then, we give the evaluation and discussions on different tasks, including image classification, object detection and semantic segmentation. Finally, the challenges that may be faced in future research are prospected.the heavy computation and storage still inevitably limit the applications of the deep CNNs in practice. Besides, due to the huge model parameter space, the prediction of the neural networks is usually viewed as a black-box, which brings great challenges to the interpretability of CNNs. Some works like [21,22,23] empirically explore the function of each layer in the network. They visualize the feature maps extracted by different filters and view each filter as a visual unit focusing on different visual components.of the ResNet-50 [28], and meanwhile save more than 75% of parameters and 50% computational time. In the literature, approaches for compressing the deep networks can be classified into five categories: parameter pruning [26,29,30,31], parameter quantizing [32,33,34,35,36,37,38,39,40,41], low-rank parameter factorization [42,43,44,45,46], transferred/compact convolutional filters [47,48,49,50], and knowledge distillation [51,52,53,54,55,56]. The parameter pruning and quantizing mainly focus on eliminating the redundancy in the model parameters respectively by removing the redundant/uncritical ones or compressing the parameter space (e.g. , from the floating-point weights to the integer ones). Low-rank factorization applies the matrix/tensor decomposition techniques to estimate the informative parameters using the proxy ones of small size. The compact convolutional filter based approaches rely on the carefullydesigned structural convolutional filters to reduce the storage and computation complexity. The knowledge distillation methods try to distill a more compact model to reproduce the output of a larger network.Among the existing network compression techniques, quantization based one serves as a promising and fast solution that yields highly compact models compared to their floating-point counterparts, by representing the network weights with very low precision. Along this direction, the most extreme quantization is binarization, the interest...

show abstract

“…As shown in Table 4, our method is constantly better than the baseline method and scheme-2 is better than scheme-1. From Figure 6: Comparison of validation errors of our two schemes based on DoReFa-Net [48] (left) and SYQ [7] (right). The decay function is the cosine decay and decay step is set to 50 epochs.…”

Section: Scheme-1 Vs Scheme-2mentioning

confidence: 99%

“…From Table 5, we could find that the gap of performance we improved becomes more and more invisible as the model size increases. Specifically, our method can improve the baseline accuracy of 0.125× network by 1.31% to 1.96% while merely raises the performance of 1.0× network by Table 5: Validation accuracies (%) for four networks of different sizes with the baseline method (SYQ [7]) and our method on SVHN dataset. The "W/A" values are the bits for quantizing weights/activations.…”

Section: Model Sizementioning

confidence: 99%

Progressive Learning of Low-Precision Networks for Image Classification

Zhou

Lv³

et al. 2021

IEEE Trans. Multimedia

View full text Add to dashboard Cite

Recent years have witnessed the great advance of deep learning in a variety of vision tasks. Many state-of-theart deep neural networks suffer from large size and high complexity, which makes it difficult to deploy in resourcelimited platforms such as mobile devices. To this end, lowprecision neural networks are widely studied which quantize weights or activations into the low-bit format. Though being efficient, low-precision networks are usually hard to train and encounter severe accuracy degradation. In this paper, we propose a new training strategy through expanding low-precision networks during training and removing the expanded parts for network inference. First, we equip each low-precision convolutional layer with an ancillary full-precision convolutional layer based on a low-precision network structure, which could guide the network to good local minima. Second, a decay method is introduced to reduce the output of the added full-precision convolution gradually, which keeps the resulted topology structure the same to the original low-precision one. Experiments on SVHN, CIFAR and ILSVRC-2012 datasets prove that the proposed method can bring faster convergence and higher accuracy for low-precision neural networks.

show abstract

SYQ: Learning Symmetric Quantization for Efficient Deep Neural Networks

Cited by 110 publications

References 16 publications

Data-Free Quantization Through Weight Equalization and Bias Correction

Data-Free Quantization Through Weight Equalization and Bias Correction

Binary neural networks: A survey

Progressive Learning of Low-Precision Networks for Image Classification

Contact Info

Product

Resources

About