2016
DOI: 10.48550/arxiv.1606.06160
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients

Abstract: We propose DoReFa-Net, a method to train convolutional neural networks that have low bitwidth weights and activations using low bitwidth parameter gradients. In particular, during backward pass, parameter gradients are stochastically quantized to low bitwidth numbers before being propagated to convolutional layers. As convolutions during forward/backward passes can now operate on low bitwidth weights and activations/gradients respectively, DoReFa-Net can use bit convolution kernels to accelerate both training … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

10
974
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 721 publications
(1,044 citation statements)
references
References 24 publications
10
974
0
Order By: Relevance
“…As an extreme case of quantization, binary neural networks (BNNs) reduce the precision of both weights and neuron activations to a single-bit [25], [26]. BNNs work well on simple tasks like MNIST, CIFAR-10, and SVHN without impacting the accuracy [27], but are showing worse performance on challenging datasets such as ImageNet with a drop of around 12% [28], [29]. BNNs provide major benefits for computation.…”
Section: A Size-optimized and Quantized Dnnsmentioning
confidence: 99%
“…As an extreme case of quantization, binary neural networks (BNNs) reduce the precision of both weights and neuron activations to a single-bit [25], [26]. BNNs work well on simple tasks like MNIST, CIFAR-10, and SVHN without impacting the accuracy [27], but are showing worse performance on challenging datasets such as ImageNet with a drop of around 12% [28], [29]. BNNs provide major benefits for computation.…”
Section: A Size-optimized and Quantized Dnnsmentioning
confidence: 99%
“…To further accelerate the training of neural networks, some work also attempts to quantize gradients. DoReFa-Net [21] uses quantized gradients in the backward propagation, but the weights and gradients are stored with full precision when updating the weights as the same as previous works. In contrast, WAGE [16] updates the quantized weights with discrete gradients.…”
Section: Neural Network Quantizationmentioning
confidence: 99%
“…What is more, quantizing gradients is challenging, which might suffer from the gradient mismatch problem. Since quantization functions are usually non-differentiable, previous works train quantized DNNs by gradients approximation, like STE [2,14,21,16]. Therefore, a heuristic quantization method that does not require gradient information is another potential solution for quantizing DNNs.…”
Section: Introductionmentioning
confidence: 99%
“…DNN model quantization has emerged as a mandatory technique for high-performance DNN inference. Thanks to the advances in model quantization algorithm [6,7,18,19,21,38], the activations and weights in 32-bit floating-point (fp32) can be quantized into extreme low bit-width with negligible inference accuracy degradation, using uniform [20] or non-uniform quantizer [23] in quantizationaware training [18,19] or post-training quantization [21]. In this work, we focus on the model quantization using 𝑁 bits -bit uniform quantizer, and its quantization function can be expressed as:…”
Section: Model Quantizationmentioning
confidence: 99%