2021
DOI: 10.1109/tnnls.2020.3008996
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting Retraining-Based Mixed-Precision Quantization for Low-Cost DNN Accelerator Design

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 18 publications
(11 citation statements)
references
References 12 publications
0
10
0
Order By: Relevance
“…To avoid performance degradation of the quantized model, QAT is firstly proposed to retrain the quantized model [15]- [18]. With full training dataset, QAT performs floating-point forward and backward propagations on DNN models and quantizes them into low-bit after each training epoch.…”
Section: A Quantization Aware Training (Qat)mentioning
confidence: 99%
See 2 more Smart Citations
“…To avoid performance degradation of the quantized model, QAT is firstly proposed to retrain the quantized model [15]- [18]. With full training dataset, QAT performs floating-point forward and backward propagations on DNN models and quantizes them into low-bit after each training epoch.…”
Section: A Quantization Aware Training (Qat)mentioning
confidence: 99%
“…In particular, PACT [15] optimizes the clipping ranges of activations during model retraining. LSQ [17] learns step size as a model parameter and MPQ [18] exploits retraining-based mix-precision quantization. However, high computational complexity of QAT will lead to restrictions on the implementation in reality.…”
Section: A Quantization Aware Training (Qat)mentioning
confidence: 99%
See 1 more Smart Citation
“…Quantization [8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26], as the name implies, is to let the weight and activation of the forward propagation calculation in the neural network and the 32-bit or 64-bit floating point number of the gradient value of the back propagation calculation are represented by low-bit floating point or fixed-point number, and can even be directly calculated. Figure 3 shows the basic idea of converting floating-point numbers into signed 8-bit fixed-point numbers.…”
Section: Model Quantizationmentioning
confidence: 99%
“…Model quantization [8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26], as a means of compressing model, can be applied to model deployment, so that both the model size and the inference delay can be reduced. At present, the sizes of SR models become larger and larger.…”
Section: Introductionmentioning
confidence: 99%