2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00141
|View full text |Cite
|
Sign up to set email alerts
|

Data-Free Quantization Through Weight Equalization and Bias Correction

Abstract: We introduce a data-free quantization method for deep neural networks that does not require fine-tuning or hyperparameter selection. It achieves near-original model performance on common computer vision architectures and tasks. 8-bit fixed-point quantization is essential for efficient inference in modern deep learning hardware architectures. However, quantizing models to run in 8-bit is a non-trivial task, frequently leading to either significant performance reduction or engineering time spent on training a ne… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

5
331
0
4

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 360 publications
(388 citation statements)
references
References 18 publications
5
331
0
4
Order By: Relevance
“…In real-world end-device settings, dierent bit-widths are supported by dierent devices [19,20,23,27,30]. This hardware exibility allows for the ability of conguring the model with dierent quantization levels to match the variety of hardware congurations of clients' mobile or IoT devices.…”
Section: Adaptive Quantized Federated Learningmentioning
confidence: 99%
“…In real-world end-device settings, dierent bit-widths are supported by dierent devices [19,20,23,27,30]. This hardware exibility allows for the ability of conguring the model with dierent quantization levels to match the variety of hardware congurations of clients' mobile or IoT devices.…”
Section: Adaptive Quantized Federated Learningmentioning
confidence: 99%
“…Disadvantages: Reducing the bit-width of the network weights (from 16 to 8 bits) leads to accuracy loss: in some cases, the converted model might show only a small performance degradation, while for some other tasks the resulting accuracy will be close to zero. Although a number of research papers dealing with network quantization were presented by Qualcomm [49,54] and Google [34,37], all showing decent accuracy results for many image classification models, there is no general recipe for quantizing arbitrary deep learning architectures. Thus, quantization is still more of a research topic, without working solutions for many AIrelated tasks (e.g., image-to-image mapping or various NLP problems).…”
Section: Quantized Inferencementioning
confidence: 99%
“…OCS [20] instead splits a channel into two channels in which the weights and outputs are halved, which reduces the dynamic range of the outliers. DFQ [21] quantizes weights and activations to 8 bits by assuming that the inputs to the activations have a Gaussian distribution, so that a model can be used to equalize the dynamic range of the data being quantized, along with a correction to the bias introduced by quantization. Clipping-based approaches are used in [10], [22], [23], in which activations or weights are clipped prior to quantization.…”
Section: Related Workmentioning
confidence: 99%