2020
DOI: 10.1007/978-3-030-58536-5_5
|View full text |Cite
|
Sign up to set email alerts
|

Post-training Piecewise Linear Quantization for Deep Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
56
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 90 publications
(56 citation statements)
references
References 36 publications
0
56
0
Order By: Relevance
“…Relying on an abundance of the previous conclusions about quantization for traditional network solutions [ 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 ], further improvements in the field of NNs, especially in NNs intended for edge devices, can be intuitively driven by the prudent application of post-training quantization. Post-training quantization is especially convenient as there is no need for retraining NN, while the memory size required for storing the weights of the quantized neural network (QNN) model can be significantly reduced compared to the baseline NN model utilizing 32-bit floating-point (FP32) format [ 6 , 14 , 15 , 19 , 33 ].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Relying on an abundance of the previous conclusions about quantization for traditional network solutions [ 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 ], further improvements in the field of NNs, especially in NNs intended for edge devices, can be intuitively driven by the prudent application of post-training quantization. Post-training quantization is especially convenient as there is no need for retraining NN, while the memory size required for storing the weights of the quantized neural network (QNN) model can be significantly reduced compared to the baseline NN model utilizing 32-bit floating-point (FP32) format [ 6 , 14 , 15 , 19 , 33 ].…”
Section: Introductionmentioning
confidence: 99%
“…Namely, an important challenge in post-training quantization is that it can lead to significant performance degradation, especially in ultra-low precision settings. To cope with this, inspired by the conclusions from classical quantization, numerous papers have addressed the problem of minimizing the inevitable post-training quantization error (see, for instance, [ 6 , 12 , 15 , 33 ]).…”
Section: Introductionmentioning
confidence: 99%
“…It is worth noticing, however, in recent years, various quantization solution have been proposed, specifically, posttraining static quantization [21]- [23] that does not involve retraining of the neural network. In the work by Fang et al [21], the authors proposed to split the weight distribution into multiple regions. For each region, the weights are then quantized with their respective scaling factors to convert to their respective integer ranges.…”
Section: B Motivation: Why Another Level Of Quantization?mentioning
confidence: 99%
“…However, owing to the requirement of the proposed scheme, each region of the weight distribution represents a separate computation path, due to the differences in their scaling factors. This requirement, as commented by the authors [21], specified that at least three or more accumulators are required based on the number of regions the weight distribution is split into. The number of accumulators (at least three) tied to each multiply and accumulate (MAC) processing element (PE) might require more hardware resource for implementation, not to mention that existing CNN accelerators usually implements large number of PE in parallel to achieve high performance computation.…”
Section: B Motivation: Why Another Level Of Quantization?mentioning
confidence: 99%
“…[22]. Post-training quantization methods [47][48][49][50] avoid these limitations by searching for the optimal tensor-cutting values to reduce quantization noise after the network model has been trained.…”
Section: Prior Workmentioning
confidence: 99%