DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients

Zhou, Shuchang; Wu, Yuxin; Ni, Zekun; Zhou, Xinyu; He, Wen; Zou, Yuheng

doi:10.48550/arxiv.1606.06160

Cited by 721 publications

(1,044 citation statements)

References 24 publications

Supporting

Mentioning

974

Contrasting

Order By: Relevance

“…As an extreme case of quantization, binary neural networks (BNNs) reduce the precision of both weights and neuron activations to a single-bit [25], [26]. BNNs work well on simple tasks like MNIST, CIFAR-10, and SVHN without impacting the accuracy [27], but are showing worse performance on challenging datasets such as ImageNet with a drop of around 12% [28], [29]. BNNs provide major benefits for computation.…”

Section: A Size-optimized and Quantized Dnnsmentioning

confidence: 99%

Sub-mW Keyword Spotting on an MCU: Analog Binary Feature Extraction and Binary Neural Networks

Cerutti¹,

Cavigelli²,

Andri³

et al. 2022

Preprint

View full text Add to dashboard Cite

Keyword spotting (KWS) is a crucial function enabling the interaction with the many ubiquitous smart devices in our surroundings, either activating them through wake-word or directly as a human-computer interface. For many applications, KWS is the entry point for our interactions with the device and, thus, an always-on workload. Many smart devices are mobile and their battery lifetime is heavily impacted by continuously running services. KWS and similar always-on services are thus the focus when optimizing the overall power consumption.This work addresses KWS energy-efficiency on low-cost microcontroller units (MCUs). We combine analog binary feature extraction with binary neural networks. By replacing the digital preprocessing with the proposed analog front-end, we show that the energy required for data acquisition and preprocessing can be reduced by 29×, cutting its share from a dominating 85% to a mere 16% of the overall energy consumption for our reference KWS application.Experimental evaluations on the Speech Commands Dataset show that the proposed system outperforms state-of-the-art accuracy and energy efficiency, respectively, by 1% and 4.3× on a 10-class dataset while providing a compelling accuracyenergy trade-off including a 2% accuracy drop for a 71× energy reduction.

show abstract

Section: A Size-optimized and Quantized Dnnsmentioning

confidence: 99%

Sub-mW Keyword Spotting on an MCU: Analog Binary Feature Extraction and Binary Neural Networks

Cerutti¹,

Cavigelli²,

Andri³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…To further accelerate the training of neural networks, some work also attempts to quantize gradients. DoReFa-Net [21] uses quantized gradients in the backward propagation, but the weights and gradients are stored with full precision when updating the weights as the same as previous works. In contrast, WAGE [16] updates the quantized weights with discrete gradients.…”

Section: Neural Network Quantizationmentioning

confidence: 99%

“…What is more, quantizing gradients is challenging, which might suffer from the gradient mismatch problem. Since quantization functions are usually non-differentiable, previous works train quantized DNNs by gradients approximation, like STE [2,14,21,16]. Therefore, a heuristic quantization method that does not require gradient information is another potential solution for quantizing DNNs.…”

Section: Introductionmentioning

confidence: 99%

Training Quantized Deep Neural Networks via Cooperative Coevolution

Fu¹,

Liu²,

Lü³

et al. 2021

Preprint

View full text Add to dashboard Cite

Quantizing deep neural networks (DNNs) has been a promising solution for deploying deep neural networks on embedded devices. However, most of the existing methods do not quantize gradients, and the process of quantizing DNNs still has a lot of floating-point operations, which hinders the further applications of quantized DNNs. To solve this problem, we propose a new heuristic method based on cooperative coevolution for quantizing DNNs. Under the framework of cooperative coevolution, we use the estimation of distribution algorithm to search for the low-bits weights. Specifically, we first construct an initial quantized network from a pre-trained network instead of random initialization and then start searching from it by restricting the search space. So far, the problem is the largest discrete problem known to be solved by evolutionary algorithms. Experiments show that our method can train 4 bit ResNet-20 on the Cifar-10 dataset without sacrificing accuracy.

show abstract

“…DNN model quantization has emerged as a mandatory technique for high-performance DNN inference. Thanks to the advances in model quantization algorithm [6,7,18,19,21,38], the activations and weights in 32-bit floating-point (fp32) can be quantized into extreme low bit-width with negligible inference accuracy degradation, using uniform [20] or non-uniform quantizer [23] in quantizationaware training [18,19] or post-training quantization [21]. In this work, we focus on the model quantization using 𝑁 bits -bit uniform quantizer, and its quantization function can be expressed as:…”

Section: Model Quantizationmentioning

confidence: 99%

N3H-Core: Neuron-designed Neural Network Accelerator via FPGA-based Heterogeneous Computing Cores

Gong,

Xu,

et al. 2021

Preprint

View full text Add to dashboard Cite

Accelerating the neural network inference by FPGA has emerged as a popular option, since the reconfigurability and high performance computing capability of FPGA intrinsically satisfies the computation demand of the fast-evolving neural algorithms. However, the popular neural accelerators on FPGA (e.g., Xilinx DPU) mainly utilize the DSP resources for constructing their processing units, while the rich LUT resources are not well exploited. Via the software-hardware co-design approach, in this work, we develop an FPGA-based heterogeneous computing system for neural network acceleration. From the hardware perspective, the proposed accelerator consists of DSP-and LUT-based GEneral Matrix-Multiplication (GEMM) computing cores, which forms the entire computing system in a heterogeneous fashion. The DSP-and LUTbased GEMM cores are computed w.r.t a unified Instruction Set Architecture (ISA) and unified buffers. Along the data flow of the neural network inference path, the computation of the convolution/fullyconnected layer is split into two portions, handled by the DSP-and LUT-based GEMM cores asynchronously. From the software perspective, we mathematically and systematically model the latency and resource utilization of the proposed heterogeneous accelerator, regarding varying system design configurations. Through leveraging the reinforcement learning technique, we construct a framework to achieve end-to-end selection and optimization of the design specification of target heterogeneous accelerator, including workload split strategy, mixed-precision quantization scheme, and resource allocation of DSP-and LUT-core. In virtue of the proposed design framework and heterogeneous computing system, our design outperforms the state-of-the-art Mix&Match design with latency reduced by 1.12-1.32× with higher inference accuracy. The N 3 H-Core is open-sourced at: https://github.com/elliothe/N3H_Core. CCS CONCEPTS• Computer systems organization → Neural networks.

show abstract

DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients

Cited by 721 publications

References 24 publications

Sub-mW Keyword Spotting on an MCU: Analog Binary Feature Extraction and Binary Neural Networks

Sub-mW Keyword Spotting on an MCU: Analog Binary Feature Extraction and Binary Neural Networks

Training Quantized Deep Neural Networks via Cooperative Coevolution

N3H-Core: Neuron-designed Neural Network Accelerator via FPGA-based Heterogeneous Computing Cores

Contact Info

Product

Resources

About