Fighting Quantization Bias With Bias

Finkelstein, Alexander; Almog, Uri; Grobman, Mark

doi:10.48550/arxiv.1906.03193

Cited by 9 publications

(18 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Bias Correction. Quantization of weights induce bias shifts to activation means that may lead to detrimental behaviour in the following layers [23,32]. can be expressed as follows:…”

Section: Weight Quantizationmentioning

confidence: 99%

See 1 more Smart Citation

HPTQ: Hardware-Friendly Post Training Quantization

Habi¹,

Peretz²,

Cohén³

et al. 2021

Preprint

View full text Add to dashboard Cite

Neural network quantization enables the deployment of models on edge devices. An essential requirement for their hardware efficiency is that the quantizers are hardware-friendly: uniform, symmetric and with power-oftwo thresholds. To the best of our knowledge, current post-training quantization methods do not support all of these constraints simultaneously. In this work we introduce a hardware-friendly post training quantization (HPTQ) framework, which addresses this problem by synergistically combining several known quantization methods. We perform a large-scale study on four tasks: classification, object detection, semantic segmentation and pose estimation over a wide variety of network architectures. Our extensive experiments show that competitive results can be obtained under hardware-friendly constraints.

show abstract

“…Bias Correction. Quantization of weights induce bias shifts to activation means that may lead to detrimental behaviour in the following layers [23,32]. can be expressed as follows:…”

Section: Weight Quantizationmentioning

confidence: 99%

“…Several works propose approaches to correct the quantization induced bias. These include using batch-normalization statistics [23], micro training [32] and applying scale and shift per channel [33].…”

Section: Weight Quantizationmentioning

confidence: 99%

HPTQ: Hardware-Friendly Post Training Quantization

Habi¹,

Peretz²,

Cohén³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Unfortunately, because the task loss function of DNNs is non-convex (Goodfellow et al, 2016), there is no analytical solution to find the form of post-training quantized weights. As a result, various approximations are being suggested mainly by using quadratic approximations that imply quantization is performed in a convex-like regime (Nagel et al, 2017;Nahshan et al, 2020).…”

Section: Weight Quantization Strategymentioning

confidence: 99%

“…Bias correction is an operation to compensate for the biased error in output activations after quantization. The amount of shift induced by quantization is diminished by adjusting the bias parameters of the neurons or channels because shifted output activations through quantization may degrade the quantization quality of the next layer (Finkelstein et al, 2019;Nagel et al, 2019). The amount of shift can be calculated as the expected error on the output activations that can be expressed as γ n = -0.9…”

Section: Bias Correction Of Q-ratermentioning

confidence: 99%

“…Bias correction has been a supplementary and optional technique for quantization. For example, bias correction is not introduced in (Zhao et al, 2019) while it is playing a key role in enhancing model accuracy in (Finkelstein et al, 2019;Nagel et al, 2019). In the context of non-convexity, Q-Rater compares two model accuracy values evaluated with or without bias correction for a layer.…”

Section: Bias Correction Of Q-ratermentioning

confidence: 99%

See 1 more Smart Citation

Q-Rater: Non-Convex Optimization for Post-Training Uniform Quantization

Kim,

Lee,

et al. 2021

Preprint

View full text Add to dashboard Cite

Various post-training uniform quantization methods have usually been studied based on convex optimization. As a result, most previous ones rely on the quantization error minimization and/or quadratic approximations. Such approaches are computationally efficient and reasonable when a large number of quantization bits are employed. When the number of quantization bits is relatively low, however, non-convex optimization is unavoidable to improve model accuracy. In this paper, we propose a new posttraining uniform quantization technique considering non-convexity. We empirically show that hyper-parameters for clipping and rounding of weights and activations can be explored by monitoring task loss. Then, an optimally searched set of hyper-parameters is frozen to proceed to the next layer such that an incremental non-convex optimization is enabled for post-training quantization. Throughout extensive experimental results using various models, our proposed technique presents higher model accuracy, especially for a low-bit quantization.

show abstract

A Survey of Quantization Methods for Efficient Neural Network Inference

Gholami¹,

Kim²,

Dong³

et al. 2022

Low-Power Computer Vision

377

143

View full text Add to dashboard Cite

As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.

show abstract

Fighting Quantization Bias With Bias

Cited by 9 publications

References 18 publications

HPTQ: Hardware-Friendly Post Training Quantization

HPTQ: Hardware-Friendly Post Training Quantization

Q-Rater: Non-Convex Optimization for Post-Training Uniform Quantization

A Survey of Quantization Methods for Efficient Neural Network Inference

Contact Info

Product

Resources

About