2021
DOI: 10.1016/j.neucom.2021.07.045
|View full text |Cite
|
Sign up to set email alerts
|

Pruning and quantization for deep neural network acceleration: A survey

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
135
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 438 publications
(197 citation statements)
references
References 75 publications
0
135
0
Order By: Relevance
“…If performed correctly, a model rarely loses accuracy, and even in these cases, only a negligible percentage is lost. Most of the models keep their initial accuracies, and some of them show improvements [41,42].…”
Section: Tensorflow Lite Model Evaluationmentioning
confidence: 99%
“…If performed correctly, a model rarely loses accuracy, and even in these cases, only a negligible percentage is lost. Most of the models keep their initial accuracies, and some of them show improvements [41,42].…”
Section: Tensorflow Lite Model Evaluationmentioning
confidence: 99%
“…Quantization and compilation are performed by the Vitis Ai tools, provided as command-line programs or Python modules, as previously, specific for a certain framework. The quantization process changes the internal representation of the model's parameters during the inference [29]. Usually, standard computational platforms perform AI-related calculations on floating point data types (with varying precision depending on if the model is used on CPUs or GPUs).…”
Section: Preparing Deep Models For Deploymentmentioning
confidence: 99%
“…The model preparation process may include some additional optimization steps, such as pruning the original deep architecture [30]. It may ultimately help us reduce the network's size by eliminating, e.g., redundant parameters and/or connections in the model, which have minimal impact on the overall accuracy of the algorithm [29].…”
Section: Preparing Deep Models For Deploymentmentioning
confidence: 99%
“…Real-time inference on resource-constrained and efficiency-demanding platforms has long been desired and extensively studied in the last decades, resulting in significant improvement on the trade-off between efficiency and accuracy (Han et al, 2015;Mei et al, 2019;Tanaka et al, 2020;Ma et al, 2020;Mishra et al, 2020;Liang et al, 2021;Liu et al, 2021). As a model compression technique, quantization is promising compared to other methods, such as network pruning (Tanaka et al, 2020;Ma et al, 2020; and slimming (Liu et al, 2017;2018), as it achieves a large compression ratio (Krishnamoorthi, 2018;Nagel et al, 2021) and is computationally beneficial for integer-only hardware.…”
Section: Introductionmentioning
confidence: 99%