2021
DOI: 10.48550/arxiv.2102.06366
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Confounding Tradeoffs for Neural Network Quantization

Abstract: Many neural network quantization techniques have been developed to decrease the computational and memory footprint of deep learning. However, these methods are evaluated subject to confounding tradeoffs that may affect inference acceleration or resource complexity in exchange for higher accuracy. In this work, we articulate a variety of tradeoffs whose impact is often overlooked and empirically analyze their impact on uniform and mixed-precision posttraining quantization, finding that these confounding tradeof… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(9 citation statements)
references
References 24 publications
(47 reference statements)
0
9
0
Order By: Relevance
“…While most research on data-free quantization [2,4,7,15,16,43,30] focuses on weight quantization, we provide empirical evidence that input quantization is responsible for a significant part of the accuracy loss, most notably on low bit representation, as illustrated in Fig 1 . Furthermore, we show that per-channel input range estimation allows tighter modelling of the full-precision distribution as compared to a per-example, dynamic approach. As a result, the proposed SPIQ (standing for Static Per-channel Input Quantization) method outperforms both static and dynamic approaches as well as existing state-of-the-art methods.…”
Section: Introductionmentioning
confidence: 87%
See 2 more Smart Citations
“…While most research on data-free quantization [2,4,7,15,16,43,30] focuses on weight quantization, we provide empirical evidence that input quantization is responsible for a significant part of the accuracy loss, most notably on low bit representation, as illustrated in Fig 1 . Furthermore, we show that per-channel input range estimation allows tighter modelling of the full-precision distribution as compared to a per-example, dynamic approach. As a result, the proposed SPIQ (standing for Static Per-channel Input Quantization) method outperforms both static and dynamic approaches as well as existing state-of-the-art methods.…”
Section: Introductionmentioning
confidence: 87%
“…Rounding and truncating are the most common examples. As discussed in [17], quantization methods are classified as either data-driven [20,23,26,38,10,19] or data-free [2,4,7,15,16,43,30,8]. Data-driven methods have been shown to work remarkably well despite a coarse approximation of the continuous optimisation problem.…”
Section: Quantizationmentioning
confidence: 99%
See 1 more Smart Citation
“…2) Post-Training Quantization: An alternative to the expensive QAT method is Post-Training Quantization (PTQ) which performs the quantization and the adjustments of the weights, without any fine-tuning [11,24,40,59,60,67,68,87,106,138,144,168,176,269]. As such, the overhead of PTQ is very low and often negligible.…”
Section: G Fine-tuning Methodsmentioning
confidence: 99%
“…Post-training quantization (PTQ) enables the user to convert an already trained float model and quantize it without retraining [10,23,7,11]. However, it can also result in drastic reduction in model quality.…”
Section: Related Workmentioning
confidence: 99%