2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00748
|View full text |Cite
|
Sign up to set email alerts
|

Quantization Networks

Abstract: Although deep neural networks are highly effective, their high computational and memory costs severely challenge their applications on portable devices. As a consequence, low-bit quantization, which converts a full-precision neural network into a low-bitwidth integer version, has been an active and promising research topic. Existing methods formulate the low-bit quantization of networks as an approximation or optimization problem. Approximation-based methods confront the gradient mismatch problem, while optimi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
101
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 234 publications
(102 citation statements)
references
References 18 publications
1
101
0
Order By: Relevance
“…As a step to enable faster inference, significant interest is shown in the design of custom ASIC NN accelerators [4], [5], [17] targeting both cloud platforms and mobile SoCs. In addition, quantization [18] is leveraged to further improve the inference efficiency. During quantization, both weights and activations are converted to lower-precision numerical representations (e.g., perform INT8 computations in place of FLOAT32).…”
Section: Impact Of Ncfet On Neural Network Inference Acceleratorsmentioning
confidence: 99%
See 1 more Smart Citation
“…As a step to enable faster inference, significant interest is shown in the design of custom ASIC NN accelerators [4], [5], [17] targeting both cloud platforms and mobile SoCs. In addition, quantization [18] is leveraged to further improve the inference efficiency. During quantization, both weights and activations are converted to lower-precision numerical representations (e.g., perform INT8 computations in place of FLOAT32).…”
Section: Impact Of Ncfet On Neural Network Inference Acceleratorsmentioning
confidence: 99%
“…Hereafter, when referring to the accuracy of a quantization size, we refer to this average value. As shown in Table 1, ResNet-101 and SqueezeNet are amenable to compression and their accuracy is slightly affected by quantization [7], [18]. On the other hand, [8]- [10], [12] are highly impacted by quantization and their accuracy degrades significantly as the quantization size decreases.…”
Section: Neural Network Inference Evaluationmentioning
confidence: 99%
“…To efficiently execute deep models on the proposed large‐scale visual computing platform, we introduce network quantitation techniques to reduce the computation load [7].…”
Section: Large‐scale Visual Computing Platformmentioning
confidence: 99%
“…When the weights and activation function outputs are represented using just a single bit, the resulting network is called a binarized neural network (BNN ) [26]. BNNs are a highly popular variant of a quantized DNN [10,40,56,57], as their computing time can be up to 58 times faster, and their memory footprint 32 times smaller, than that of traditional DNNs [45]. There are also network architectures in which some parts of the network are quantized, and others are not [45].…”
Section: Introductionmentioning
confidence: 99%