Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss

Jung, Sungmoon; Son, Changyong; Lee, Seohyung; Son, Jinwoo; Han, Jae-Joon; Kwak, Youngjun; Hwang, Sung Ju; Choi, Changkyu

doi:10.1109/cvpr.2019.00448

Cited by 304 publications

(232 citation statements)

References 27 publications

Supporting

Mentioning

225

Contrasting

Unclassified

Order By: Relevance

“…But lower bit quantization faces more challenges on accuracy. To address this problem, [7,20] try to optimize the clipping value or quantization interval for the task specific loss in an end-to-end manner. [36,46,39] apply different techniques to find optimal bit-widths for each layer.…”

Section: Network Quantizationmentioning

confidence: 99%

“…[36,46,39] apply different techniques to find optimal bit-widths for each layer. [42] and [20] optimize the training process with incremental and progressive quantization. [29] and [16] adjust the network structure to adapt to quantization.…”

Section: Network Quantizationmentioning

confidence: 99%

“…To address * corresponding author this problem, the quantization technique has emerged as a promising network compression solution and achieved substantial progress in recent years. It can largely reduce the network storage and meanwhile accelerate the inference speed using different types of quantizers, mainly including binary/ternary [8,9,16,23,45,25,33], uniform [46,28,6,26,43,18,36,42,20,7,2,39,21] and nonuniform [44,35,13,5,41,30,38,37,3].…”

Section: Introductionmentioning

confidence: 99%

“…Both operations contribute to the quantization loss. Therefore, to alleviate the performance degradation, it is also important to find an appropriate clipping range and make a balance between clipping and rounding [7,20].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

Gong

Liu

Jiang

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

312

246

View full text Add to dashboard Cite

Hardware-friendly network quantization (e.g., binary/uniform quantization) can efficiently accelerate the inference and meanwhile reduce memory consumption of the deep neural networks, which is crucial for model deployment on resource-limited devices like mobile phones. However, due to the discreteness of low-bit quantization, existing quantization methods often face the unstable training process and severe performance degradation. To address this problem, in this paper we propose Differentiable Soft Quantization (DSQ) to bridge the gap between the full-precision and low-bit networks. DSQ can automatically evolve during training to gradually approximate the standard quantization. Owing to its differentiable property, DSQ can help pursue the accurate gradients in backward propagation, and reduce the quantization loss in forward process with an appropriate clipping range. Extensive experiments over several popular network structures show that training lowbit neural networks with DSQ can consistently outperform state-of-the-art quantization methods. Besides, our first efficient implementation for deploying 2 to 4-bit DSQ on devices with ARM architecture achieves up to 1.7× speed up, compared with the open-source 8-bit high-performance inference framework NCNN [31].

show abstract

Section: Network Quantizationmentioning

confidence: 99%

Section: Network Quantizationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

Gong

Liu

Jiang

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

312

246

View full text Add to dashboard Cite

show abstract

“…As shown in the figure, when the stride changes from 1 to 2, the number of redundant cells is fixed to 1, while the distance between two adjacent PEs (dPEs) is doubled. Although large stride or odd number of input feature bit-width can degrade the effectiveness of the proposed DWM-based cell array, since stride is generally less than 3 [21] and the 4-bit input and weight configurations can present the full-precision accuracy [28], [29] for recent CNNs, the proposed systolic DWM-based array is effectively reconfigurable for the different layer configuration.…”

Section: ) Generalization Of Dwm Input and Weight Busmentioning

confidence: 99%

Domain Wall Memory-Based Design of Deep Neural Network Convolutional Layers

et al. 2020

View full text Add to dashboard Cite

In the hardware implementation of deep learning algorithms such as, convolutional neural networks (CNNs) and binarized neural networks (BNNs), multiple dot products and memories for storing parameters take a significant portion of area and power consumption. In this paper, we propose a domain wall memory (DWM) based design of CNN and BNN convolutional layers. In the proposed design, the resistive cell sensing mechanism is efficiently exploited to design low-cost DWM-based cell arrays for storing parameters. The unique serial access mechanism and small footprint of DWM are also used to reduce the area and energy cost of DWM-based design for filter sliding. Simulation results with 65 nm CMOS process show 45% and 43% of energy savings compared to the conventional CNN and BNN design approach, respectively. INDEX TERMS Binarized neural network, convolutional neural network, deep neural network, domain wall memory.

show abstract

Geometric Tuning of Single‐Atom FeN₄ Sites via Edge‐Generation Enhances Multi‐Enzymatic Properties

et al. 2023

View full text Add to dashboard Cite

Single‐atom nanozymes (SAzymes) are considered promising alternatives to natural enzymes. The catalytic performance of SAzymes featuring homogeneous, well‐defined active structures can be enhanced through elucidating structure‐activity relationship and tailoring physicochemical properties. However, manipulating enzymatic properties through structural variation is an underdeveloped approach. Herein, the synthesis of edge‐rich Fe single‐atom nanozymes (FeNC‐edge) via an H2O2‐mediated edge generation is reported. By controlling the number of edge sites, the peroxidase (POD)‐ and oxidase (OXD)‐like performance is significantly enhanced. The activity enhancement results from the presence of abundant edges, which provide new anchoring sites to mononuclear Fe. Experimental results combined with density functional theory (DFT) calculations reveal that FeN4 moieties in the edge sites display high electron density of Fe atoms and open N atoms. Finally, it is demonstrated that FeNC‐edge nanozyme effectively inhibits tumor growth both in vitro and in vivo, suggesting that edge‐tailoring is an efficient strategy for developing artificial enzymes as novel catalytic therapeutics.

show abstract

Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss

Cited by 304 publications

References 27 publications

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

Domain Wall Memory-Based Design of Deep Neural Network Convolutional Layers

Geometric Tuning of Single‐Atom FeN₄ Sites via Edge‐Generation Enhances Multi‐Enzymatic Properties

Contact Info

Product

Resources

About

Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss

Cited by 304 publications

References 27 publications

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

Domain Wall Memory-Based Design of Deep Neural Network Convolutional Layers

Geometric Tuning of Single‐Atom FeN4 Sites via Edge‐Generation Enhances Multi‐Enzymatic Properties

Contact Info

Product

Resources

About

Geometric Tuning of Single‐Atom FeN₄ Sites via Edge‐Generation Enhances Multi‐Enzymatic Properties