2021
DOI: 10.48550/arxiv.2112.06126
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Neural Network Quantization for Efficient Inference: A Survey

Abstract: As neural networks have become more powerful, there has been a rising desire to deploy them in the real world; however, the power and accuracy of neural networks is largely due to their depth and complexity, making them difficult to deploy, especially in resource-constrained devices. Neural network quantization has recently arisen to meet this demand of reducing the size and complexity of neural networks by reducing the precision of a network. With smaller and simpler networks, it becomes possible to run neura… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 25 publications
0
6
0
Order By: Relevance
“…Quantization has demonstrated excellent and consistent results when used during the training and inference using different NN models [39], [75]- [77]. Particularly, it is especially effective during inference because it saves computing resources without significantly decreasing accuracy.…”
Section: Quantizationmentioning
confidence: 98%
“…Quantization has demonstrated excellent and consistent results when used during the training and inference using different NN models [39], [75]- [77]. Particularly, it is especially effective during inference because it saves computing resources without significantly decreasing accuracy.…”
Section: Quantizationmentioning
confidence: 98%
“…Neural network quantization facilitates the efficient deployment of deep neural networks (DNNs) on resourceconstrained platforms, such as drones and Internet-of-Things (IoT) devices [1], [2]. During inference, MAC operations consisting of multiplications and accumulations dominate the arithmetic cost.…”
Section: Introductionmentioning
confidence: 99%
“…As their names suggest, PTQ quantizes model parameters after NN training, and QAT -during NN training. PTQ is suitable for applications where speed and simplicity of quantization are preferred to the extreme bit precision reduction, while QAT shall be chosen for low-precision quantization with negligible accuracy degradation [5].…”
Section: Introductionmentioning
confidence: 99%
“…QAT usually results in lower accuracy loss than PTQ. However, training a quantized model from scratch increases the quantization time [9]. In PTQ, one can choose any pre-trained model and perform quantization much faster; however, getting good performance in low-bit precision is difficult [9].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation