Low-Power Computer Vision 2022
DOI: 10.1201/9781003162810-13
|View full text |Cite
|
Sign up to set email alerts
|

A Survey of Quantization Methods for Efficient Neural Network Inference

Abstract: As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attend… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
143
0
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 387 publications
(169 citation statements)
references
References 226 publications
0
143
0
1
Order By: Relevance
“…On the other hand, if f [0] is negative, the sign extension part are all 1s in binary expression and represents -1 in 2's complementary representation. In such condition, we decrement 1 from f [1] to form the second S-bit and perform the packing process with concatenation and 1-bit incrementer instead of using a larger bitwidth adder. The packing process works recursively for all the slices while slicing of the output works in a reversed manner.…”
Section: A From Multiplication To Convolutionmentioning
confidence: 99%
See 1 more Smart Citation
“…On the other hand, if f [0] is negative, the sign extension part are all 1s in binary expression and represents -1 in 2's complementary representation. In such condition, we decrement 1 from f [1] to form the second S-bit and perform the packing process with concatenation and 1-bit incrementer instead of using a larger bitwidth adder. The packing process works recursively for all the slices while slicing of the output works in a reversed manner.…”
Section: A From Multiplication To Convolutionmentioning
confidence: 99%
“…Quantization is a frequently used technique in hardware implementation of Deep Neural Network (DNN) models in order to reduce both the memory consumption and execution time [1]- [6]. It is typically done by approximating highprecision floating point numbers to low-bitwidth integers or fixed-point numbers.…”
Section: Introductionmentioning
confidence: 99%
“…†Work done when the author was at Microsoft. Figure 1: Growth of DNN model size and GPU memory capacity over the past decade [14,57]. Memory consumed here only accounts for model state which is a small fraction of total training memory footprint [7,14,31,63,68,74].…”
Section: Introductionmentioning
confidence: 99%
“…It is required to compress these neural networks. Quantization is one of the most effective ways to compress neural networks [8]. The floating-point values are quantized to integers with a low bit-width, reducing the memory consumption and the computation cost.…”
Section: Introductionmentioning
confidence: 99%