2021
DOI: 10.48550/arxiv.2103.13630
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Survey of Quantization Methods for Efficient Neural Network Inference

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
132
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 126 publications
(202 citation statements)
references
References 0 publications
0
132
0
Order By: Relevance
“…It is so complex that there is an IEEE standard for how real numbers should be represented as well as how arithmetic on this real number representation should work-IEEE Standard 754, also known as IEEE floating point. Since there are infinitely many real numbers and only so many bits that can be allocated for representing each number on machines, we can actually view representing real numbers on computers as a quantization problem itself because we are reducing the precision of the reals [14].…”
Section: Representing Numbers On Machinesmentioning
confidence: 99%
See 2 more Smart Citations
“…It is so complex that there is an IEEE standard for how real numbers should be represented as well as how arithmetic on this real number representation should work-IEEE Standard 754, also known as IEEE floating point. Since there are infinitely many real numbers and only so many bits that can be allocated for representing each number on machines, we can actually view representing real numbers on computers as a quantization problem itself because we are reducing the precision of the reals [14].…”
Section: Representing Numbers On Machinesmentioning
confidence: 99%
“…Enabling neural network inference in resource-constrained settings is important so that NNs can solve problems like speech recognition, autonomous driving, and image classification in IoT devices, vehicles, and more. To realize this, neural network inference must achieve 1) real-time latency, 2) low energy consumption, and 3) high accuracy [14].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…This hinders the deployment of DNNs to resourcelimited applications. Therefore, model compression without significant performance degradation is an important active area of deep learning research [11,25,6,10]. One prominent approach to compression is quantization.…”
Section: Introductionmentioning
confidence: 99%
“…In fact, once we get the lower bound of E Xt,u 2 Xt 2 2 as in (10), the quantization error for unbounded data ( 14) can be derived similarly to the proof of Theorem 2.1, albeit using different techniques. It follows from the Cauchy-Schwarz inequality that…”
mentioning
confidence: 99%