2021
DOI: 10.48550/arxiv.2111.14826
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation

Abstract: The nonuniform quantization strategy for compressing neural networks usually achieves better performance than its counterpart, i.e., uniform strategy, due to its superior representational capacity. However, many nonuniform quantization methods overlook the complicated projection process in implementing the nonuniformly quantized weights/activations, which incurs non-negligible time and space overhead in hardware deployment. In this study, we propose Nonuniform-to-Uniform Quantization (N2UQ), a method that can … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 42 publications
0
1
0
Order By: Relevance
“…These methods focus on quantizing most, if not all network layers to the same uniform bit-width. While this has been shown to be effective for recovering full precision accuracy for higher bit widths, using extremely low precision still leads to significant accuracy degradation (Courbariaux et al, 2015;Esser et al, 2015;Rastegari et al, 2016;Zhou et al, 2016;McKinstry et al, 2019;Esser et al, 2020;Liu et al, 2021c). To further push the envelope of maximizing throughput and minimizing memory footprint while maintaining task performance, mixed precision quantization methods have emerged with the goal of optimizing the bit-width of each layer independently to maximize overall network performance (Dong et al, 2019;Yao et al, 2021;Chen et al, 2021).…”
Section: Introductionmentioning
confidence: 99%
“…These methods focus on quantizing most, if not all network layers to the same uniform bit-width. While this has been shown to be effective for recovering full precision accuracy for higher bit widths, using extremely low precision still leads to significant accuracy degradation (Courbariaux et al, 2015;Esser et al, 2015;Rastegari et al, 2016;Zhou et al, 2016;McKinstry et al, 2019;Esser et al, 2020;Liu et al, 2021c). To further push the envelope of maximizing throughput and minimizing memory footprint while maintaining task performance, mixed precision quantization methods have emerged with the goal of optimizing the bit-width of each layer independently to maximize overall network performance (Dong et al, 2019;Yao et al, 2021;Chen et al, 2021).…”
Section: Introductionmentioning
confidence: 99%