Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorization

Eldad, Meller,; Finkel’stein, Alexander M.; Almog, Uri; Grobman, Mark

doi:10.48550/arxiv.1902.01917

Cited by 9 publications

(18 citation statements)

References 12 publications

(17 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Activation Equalization. In this step, we equalize activation ranges per channel similarly to the methods presented in [23,28]. Here, we set the scale-perchannel factor according to the value of the threshold that is selected per-tensor.…”

Section: Shift Negative Correction (Snc)mentioning

confidence: 99%

“…The motivation to use this scaling factor in order to equalize the activation ranges is to use the maximum range of the quantization bins for each channel (see Figure 4). The authors in [23,28] suggest to perform channel equalization by exploiting the positive scale equivariance property of activation functions. It holds for any piece-wise linear activation function in its relaxed form: φ (Sx) = S φ (x) where φ is a piece-wise linear function, φ is its modified version that fits this requirement and S = diag (s) is a diagonal matrix with s k denoting the scale factor for channel k.…”

Section: Shift Negative Correction (Snc)mentioning

confidence: 99%

“…Method PC PoT F-Acc Q-Acc ∆ QAT QT [10] 70.9 70.0 0.9 TQT [11] 71.1 71.1 0.0 PTQ SSBD [28] 70.9 69.95 0.95 Krishnamoorthi [30] 70.9 70.3 0.6 Wu et al [35] 71.88 70.39 1.49 Lee et al [36] 69.5 68.84 0.66 HPTQ (Our) 70.55 70.41 0.14 Type Method PC PoT F-Acc Q-Acc ∆ QAT QT [10] 71.9 70.9 1.0 RVQuant [37] 70.10 70.29 -0.19 TQT [11] 71.7 71.8 -0.10 PTQ AdaQuant [38] 73.03 73.03 0.0 ZeroQ [15] 73.03 72.91 0.12 SSBD [28] 71.9 71.29 0.61 Wu et al [35] 71.88 71.14 0.74 Krishnamoorthi [30] 71…”

Section: Typementioning

confidence: 99%

“…QT [10] 76.4 74.9 1.5 RVQuant [37] 75.92 75.67 0.25 HAWQ-V3 [39] 77.72 77.58 0.14 LSQ [40] 76.9 76.8 0.1 TQT [11] 76.9 76.5 0.4 FAQ [41] 75.4 75.4 0.0 PTQ ZeroQ [15] 77.72 77.67 0.05 OCS [42] 76.1 75.9 0.2 SSBD [28] 75.2 74.95 0.25 He et al [43] 75.3 75.03 0.27 Wu et al [35] 76 Object Detection. We evaluate HPTQ on COCO [45] using the SSD detector [4] with several backbones 3 .…”

Section: Typementioning

confidence: 99%

See 3 more Smart Citations

HPTQ: Hardware-Friendly Post Training Quantization

Habi¹,

Peretz²,

Cohén³

et al. 2021

Preprint

View full text Add to dashboard Cite

Neural network quantization enables the deployment of models on edge devices. An essential requirement for their hardware efficiency is that the quantizers are hardware-friendly: uniform, symmetric and with power-oftwo thresholds. To the best of our knowledge, current post-training quantization methods do not support all of these constraints simultaneously. In this work we introduce a hardware-friendly post training quantization (HPTQ) framework, which addresses this problem by synergistically combining several known quantization methods. We perform a large-scale study on four tasks: classification, object detection, semantic segmentation and pose estimation over a wide variety of network architectures. Our extensive experiments show that competitive results can be obtained under hardware-friendly constraints.

show abstract

Section: Shift Negative Correction (Snc)mentioning

confidence: 99%

Section: Shift Negative Correction (Snc)mentioning

confidence: 99%

Section: Typementioning

confidence: 99%

Section: Typementioning

confidence: 99%

See 2 more Smart Citations

HPTQ: Hardware-Friendly Post Training Quantization

Habi¹,

Peretz²,

Cohén³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Banner et al (2018) derived an analytical expression for the approximation of the optimal threshold under the assumption of Laplacian or Gaussian distribution of the weights, which allows to achieve single percent accuracy reduction for 8-bit weights and 4-bit activations. Meller et al (2019) showed that the equalization of channels and the removal of outliers allowed to improve quantization quality. Choukroun et al (2019) used one-dimensional exact line-search to evaluate optimal quantization threshold, demonstrating state-ofthe-art results for 4-bit weight and activation quantization.…”

Section: Related Workmentioning

confidence: 99%

Feature Map Transform Coding for Energy-Efficient CNN Inference

Chmiel,

Baskin,

Banner

et al. 2019

Preprint

View full text Add to dashboard Cite

Convolutional neural networks (CNNs) achieve state-of-the-art accuracy in a variety of tasks in computer vision and beyond. One of the major obstacles hindering the ubiquitous use of CNNs for inference on low-power edge devices is their relatively high computational complexity and memory bandwidth requirements. The latter often dominates the energy footprint on modern hardware. In this paper, we introduce a lossy transform coding approach, inspired by image and video compression, designed to reduce the memory bandwidth due to the storage of intermediate activation calculation results. Our method exploits the high correlations between feature maps and adjacent pixels and allows to halve the data transfer volumes to the main memory without re-training. We analyze the performance of our approach on a variety of CNN architectures and demonstrated FPGA implementation of ResNet18 with our approach results in reduction of around 40% in the memory energy footprint compared to quantized network with negligible impact on accuracy. A reference implementation accompanies the paper.Preprint. Under review.

show abstract

The Knowledge Within: Methods for Data-Free Model Compression

Haroush¹,

Hubara

Hoffer³

et al. 2020

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

Background: Recently, an extensive amount of research has been focused on compressing and accelerating Deep Neural Networks (DNNs). So far, high compression rate algorithms required the entire training dataset, or its subset, for fine-tuning and low precision calibration process. However, this requirement is unacceptable when sensitive data is involved as in medical and biometric use-cases.Contributions: We present three methods for generating synthetic samples from trained models. Then, we demonstrate how these samples can be used to fine-tune or to calibrate quantized models with negligible accuracy degradation compared to the original training set -without using any real data in the process. Furthermore, we suggest that our best performing method, leveraging intrinsic batch normalization layers' statistics of a trained model, can be used to evaluate data similarity. Our approach opens a path towards genuine data-free model compression, alleviating the need for training data during deployment.

show abstract

Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorization

Cited by 9 publications

References 12 publications

HPTQ: Hardware-Friendly Post Training Quantization

HPTQ: Hardware-Friendly Post Training Quantization

Feature Map Transform Coding for Energy-Efficient CNN Inference

The Knowledge Within: Methods for Data-Free Model Compression

Contact Info

Product

Resources

About