Data-Free Quantization Through Weight Equalization and Bias Correction

Nagel, Markus; Baalen, Mart van; Blankevoort, Tijmen; Welling, Max

doi:10.1109/iccv.2019.00141

Cited by 360 publications

(388 citation statements)

References 18 publications

Supporting

Mentioning

331

Contrasting

Unclassified

Order By: Relevance

“…In real-world end-device settings, dierent bit-widths are supported by dierent devices [19,20,23,27,30]. This hardware exibility allows for the ability of conguring the model with dierent quantization levels to match the variety of hardware congurations of clients' mobile or IoT devices.…”

Section: Adaptive Quantized Federated Learningmentioning

confidence: 99%

Towards Mitigating Device Heterogeneity in Federated Learning via Adaptive Model Quantization

Abdelmoniem

Canini

2021

Proceedings of the 1st Workshop on Machine Learning and Systems

View full text Add to dashboard Cite

Federated learning (FL) is increasingly becoming the norm for training models over distributed and private datasets. Major service providers rely on FL to improve services such as text auto-completion, virtual keyboards, and item recommendations. Nonetheless, training models with FL in practice requires signicant amount of time (days or even weeks) because FL tasks execute in highly heterogeneous environments where devices only have widespread yet limited computing capabilities and network connectivity conditions.In this paper, we focus on mitigating the extent of device heterogeneity, which is a main contributing factor to training time in FL. We propose AQFL, a simple and practical approach leveraging adaptive model quantization to homogenize the computing resources of the clients. We evaluate AQFL on ve common FL benchmarks. The results show that, in heterogeneous settings, AQFL obtains nearly the same quality and fairness of the model trained in homogeneous settings.

show abstract

Section: Adaptive Quantized Federated Learningmentioning

confidence: 99%

Towards Mitigating Device Heterogeneity in Federated Learning via Adaptive Model Quantization

Abdelmoniem

Canini

2021

Proceedings of the 1st Workshop on Machine Learning and Systems

View full text Add to dashboard Cite

show abstract

“…Disadvantages: Reducing the bit-width of the network weights (from 16 to 8 bits) leads to accuracy loss: in some cases, the converted model might show only a small performance degradation, while for some other tasks the resulting accuracy will be close to zero. Although a number of research papers dealing with network quantization were presented by Qualcomm [49,54] and Google [34,37], all showing decent accuracy results for many image classification models, there is no general recipe for quantizing arbitrary deep learning architectures. Thus, quantization is still more of a research topic, without working solutions for many AIrelated tasks (e.g., image-to-image mapping or various NLP problems).…”

Section: Quantized Inferencementioning

confidence: 99%

AI Benchmark: All About Deep Learning on Smartphones in 2019

Ignatov

Timofte

Kulik

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)

190

106

View full text Add to dashboard Cite

The performance of mobile AI accelerators has been evolving rapidly in the past two years, nearly doubling with each new generation of SoCs. The current 4th generation of mobile NPUs is already approaching the results of CUDAcompatible Nvidia graphics cards presented not long ago, which together with the increased capabilities of mobile deep learning frameworks makes it possible to run complex and deep AI models on mobile devices. In this paper, we evaluate the performance and compare the results of all chipsets from Qualcomm, HiSilicon, Samsung, MediaTek and Unisoc that are providing hardware acceleration for AI inference. We also discuss the recent changes in the Android ML pipeline and provide an overview of the deployment of deep learning models on mobile devices. All numerical results provided in this paper can be found and are regularly updated on the official project website 1 . * We also thank Oli Gaymond (ogaymond@google.com), Google Inc., for writing and editing section 3.1 of this paper. 1

show abstract

“…OCS [20] instead splits a channel into two channels in which the weights and outputs are halved, which reduces the dynamic range of the outliers. DFQ [21] quantizes weights and activations to 8 bits by assuming that the inputs to the activations have a Gaussian distribution, so that a model can be used to equalize the dynamic range of the data being quantized, along with a correction to the bias introduced by quantization. Clipping-based approaches are used in [10], [22], [23], in which activations or weights are clipped prior to quantization.…”

Section: Related Workmentioning

confidence: 99%

Lightweight Compression of Intermediate Neural Network Features for Collaborative Intelligence

Cohen

Choi

Bajić

2021

IEEE Open J. Circuits Syst.

View full text Add to dashboard Cite

In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a lightweight device such as a mobile phone or edge device, and the remaining portion of the DNN is processed where more computing resources are available, such as in the cloud. This paper presents a novel lightweight compression technique designed specifically to quantize and compress the features output by the intermediate layer of a split DNN, without requiring any retraining of the network weights. Mathematical models for estimating the clipping and quantization error of leaky-ReLU and ReLU activations at this intermediate layer are used to compute optimal clipping ranges for coarse quantization. We also present a modified entropy-constrained design algorithm for quantizing clipped activations. When applied to popular object-detection and classification DNNs, we were able to compress the 32-bit floating point intermediate activations down to 0.6 to 0.8 bits, while keeping the loss in accuracy to less than 1%. When compared to HEVC, we found that the lightweight codec consistently provided better inference accuracy, by up to 1.3%. The performance and simplicity of this lightweight compression technique makes it an attractive option for coding an intermediate layer of a split neural network for edge/cloud applications.

show abstract

Data-Free Quantization Through Weight Equalization and Bias Correction

Cited by 360 publications

References 18 publications

Towards Mitigating Device Heterogeneity in Federated Learning via Adaptive Model Quantization

Towards Mitigating Device Heterogeneity in Federated Learning via Adaptive Model Quantization

AI Benchmark: All About Deep Learning on Smartphones in 2019

Lightweight Compression of Intermediate Neural Network Features for Collaborative Intelligence

Contact Info

Product

Resources

About