2022
DOI: 10.48550/arxiv.2201.11113
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Post-training Quantization for Neural Networks with Provable Guarantees

Abstract: While neural networks have been remarkably successful in a wide array of applications, implementing them in resource-constrained hardware remains an area of intense research. By replacing the weights of a neural network with quantized (e.g., 4-bit, or binary) counterparts, massive savings in computation cost, memory, and power consumption are attained. We modify a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism, and rigorously analyze its error. We pro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 24 publications
0
2
0
Order By: Relevance
“…Notably, the resulting compression of sparse models is significantly higher with relatively small degradation in accuracy. (Choukroun et al, 2019) 73.39 76.01 2.62 AdaRound (Nagel et al, 2020) 75.23 76.07 0.84 S-AdaQuant (Hubara et al, 2021) 75.10 77.20 2.10 BRECQ (Li et al, 2021) 76.29 77.00 0.71 GPFQ (Zhang et al, 2022) 74…”
Section: Resultsmentioning
confidence: 99%
“…Notably, the resulting compression of sparse models is significantly higher with relatively small degradation in accuracy. (Choukroun et al, 2019) 73.39 76.01 2.62 AdaRound (Nagel et al, 2020) 75.23 76.07 0.84 S-AdaQuant (Hubara et al, 2021) 75.10 77.20 2.10 BRECQ (Li et al, 2021) 76.29 77.00 0.71 GPFQ (Zhang et al, 2022) 74…”
Section: Resultsmentioning
confidence: 99%
“…The second approach is to perform over-sampling using the artificial images generated by the proposed GAN. Finally, each trial is subjected to post-training dynamic range quantization which converts weights to 8-bit precision to compress model size and decrease the inference time to fit a real-time system on edge devices [50]. Table 4 shows the results of the proposed classifiers trials with respect to the size of the model file in megabytes (MB), inference time in milliseconds (ms), AUC, the precision of normal (Norm) and malignant (Mal), recall of Norm and Mal, F1-score of Norm and Mal, and accuracy.…”
Section: Fig 6: Comparison Between U-net Training Loss and Validation...mentioning
confidence: 99%