“…Minimizing KL divergence between full-precision weight distribution and quantized weight distribution is also proposed (Migacz, 2017). Since input data is not utilized, the quantization process can be simple and fast (Nagel et al, 2019) even though the correlation between weight quantization and task loss is not deeply investigated.…”
Section: Weight Quantization Strategymentioning
confidence: 99%
“…Bias correction is an operation to compensate for the biased error in output activations after quantization. The amount of shift induced by quantization is diminished by adjusting the bias parameters of the neurons or channels because shifted output activations through quantization may degrade the quantization quality of the next layer (Finkelstein et al, 2019;Nagel et al, 2019). The amount of shift can be calculated as the expected error on the output activations that can be expressed as γ n = -0.9…”
Section: Bias Correction Of Q-ratermentioning
confidence: 99%
“…Bias correction has been a supplementary and optional technique for quantization. For example, bias correction is not introduced in (Zhao et al, 2019) while it is playing a key role in enhancing model accuracy in (Finkelstein et al, 2019;Nagel et al, 2019). In the context of non-convexity, Q-Rater compares two model accuracy values evaluated with or without bias correction for a layer.…”
Various post-training uniform quantization methods have usually been studied based on convex optimization. As a result, most previous ones rely on the quantization error minimization and/or quadratic approximations. Such approaches are computationally efficient and reasonable when a large number of quantization bits are employed. When the number of quantization bits is relatively low, however, non-convex optimization is unavoidable to improve model accuracy. In this paper, we propose a new posttraining uniform quantization technique considering non-convexity. We empirically show that hyper-parameters for clipping and rounding of weights and activations can be explored by monitoring task loss. Then, an optimally searched set of hyper-parameters is frozen to proceed to the next layer such that an incremental non-convex optimization is enabled for post-training quantization. Throughout extensive experimental results using various models, our proposed technique presents higher model accuracy, especially for a low-bit quantization.
“…Minimizing KL divergence between full-precision weight distribution and quantized weight distribution is also proposed (Migacz, 2017). Since input data is not utilized, the quantization process can be simple and fast (Nagel et al, 2019) even though the correlation between weight quantization and task loss is not deeply investigated.…”
Section: Weight Quantization Strategymentioning
confidence: 99%
“…Bias correction is an operation to compensate for the biased error in output activations after quantization. The amount of shift induced by quantization is diminished by adjusting the bias parameters of the neurons or channels because shifted output activations through quantization may degrade the quantization quality of the next layer (Finkelstein et al, 2019;Nagel et al, 2019). The amount of shift can be calculated as the expected error on the output activations that can be expressed as γ n = -0.9…”
Section: Bias Correction Of Q-ratermentioning
confidence: 99%
“…Bias correction has been a supplementary and optional technique for quantization. For example, bias correction is not introduced in (Zhao et al, 2019) while it is playing a key role in enhancing model accuracy in (Finkelstein et al, 2019;Nagel et al, 2019). In the context of non-convexity, Q-Rater compares two model accuracy values evaluated with or without bias correction for a layer.…”
Various post-training uniform quantization methods have usually been studied based on convex optimization. As a result, most previous ones rely on the quantization error minimization and/or quadratic approximations. Such approaches are computationally efficient and reasonable when a large number of quantization bits are employed. When the number of quantization bits is relatively low, however, non-convex optimization is unavoidable to improve model accuracy. In this paper, we propose a new posttraining uniform quantization technique considering non-convexity. We empirically show that hyper-parameters for clipping and rounding of weights and activations can be explored by monitoring task loss. Then, an optimally searched set of hyper-parameters is frozen to proceed to the next layer such that an incremental non-convex optimization is enabled for post-training quantization. Throughout extensive experimental results using various models, our proposed technique presents higher model accuracy, especially for a low-bit quantization.
“…Furthermore, the optimization process is typically constrained using some prior knowledge about the input, such as high correlation between nearby pixels within an image to avoid sample over-fit. Similar approach was recently adapted for several use-cases including for the purpose of data free distillation [21,3,17] with limited success.…”
Background: Recently, an extensive amount of research has been focused on compressing and accelerating Deep Neural Networks (DNNs). So far, high compression rate algorithms required the entire training dataset, or its subset, for fine-tuning and low precision calibration process. However, this requirement is unacceptable when sensitive data is involved as in medical and biometric use-cases.Contributions: We present three methods for generating synthetic samples from trained models. Then, we demonstrate how these samples can be used to fine-tune or to calibrate quantized models with negligible accuracy degradation compared to the original training set -without using any real data in the process. Furthermore, we suggest that our best performing method, leveraging intrinsic batch normalization layers' statistics of a trained model, can be used to evaluate data similarity. Our approach opens a path towards genuine data-free model compression, alleviating the need for training data during deployment.
“…By approximating real-valued weights and activations using low-bit numbers, quantized neural networks (QNNs) trained with state-of-the-art algorithms (e.g., Courbariaux et al, 2015;Rastegari et al, 2016;Louizos et al, 2018;Li et al, 2019) can be shown to perform similarly as their full-precision counterparts (e.g., Jung et al, 2019;Li et al, 2019). This work focuses on the problem of post-training quantization, which aims to generate a QNN from a pretrained full-precision network, without accessing the original training data (e.g., Sung et al, 2015;Krishnamoorthi, 2018;Zhao et al, 2019;Meller et al, 2019;Banner et al, 2019;Nagel et al, 2019;Choukroun et al, 2019). This scenario appears widely in practice.…”
We consider the post-training quantization problem, which discretizes the weights of pre-trained deep neural networks without re-training the model. We propose multipoint quantization, a quantization method that approximates a fullprecision weight vector using a linear combination of multiple vectors of low-bit numbers; this is in contrast to typical quantization methods that approximate each weight using a single low precision number. Computationally, we construct the multipoint quantization with an efficient greedy selection procedure, and adaptively decides the number of low precision points on each quantized weight vector based on the error of its output. This allows us to achieve higher precision levels for important weights that greatly influence the outputs, yielding an "effect of mixed precision" but without physical mixed precision implementations (which requires specialized hardware accelerators (Wang et al., 2019)). Empirically, our method can be implemented by common operands, bringing almost no memory and computation overhead. We show that our method outperforms a range of state-of-the-art methods on ImageNet classification and it can be generalized to more challenging tasks like PASCAL VOC object detection.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.