2021
DOI: 10.48550/arxiv.2102.04503
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference

Steve Dai,
Rangharajan Venkatesan,
Haoxing Ren
et al.

Abstract: Quantization enables efficient acceleration of deep neural networks by reducing model memory footprint and exploiting low-cost integer math hardware units. Quantization maps floating-point weights and activations in a trained model to low-bitwidth integer values using scale factors. Excessive quantization, reducing precision too aggressively, results in accuracy degradation. When scale factors are shared at a coarse granularity across many dimensions of each tensor, effective precision of individual elements w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 21 publications
(28 reference statements)
0
2
0
Order By: Relevance
“…Weight quantization, on the other hand, reduces the numerical precision of the model parameters, leading to significant reductions in both model size and computational requirements. Various weight quantization techniques have been proposed, including binary [38], ternary [39], and vector quantization [40]. Despite the advantages of weight quantization, it may introduce quantization errors that can affect the model's performance, especially when extreme quantization levels are applied.…”
Section: Model Compression Methodsmentioning
confidence: 99%
“…Weight quantization, on the other hand, reduces the numerical precision of the model parameters, leading to significant reductions in both model size and computational requirements. Various weight quantization techniques have been proposed, including binary [38], ternary [39], and vector quantization [40]. Despite the advantages of weight quantization, it may introduce quantization errors that can affect the model's performance, especially when extreme quantization levels are applied.…”
Section: Model Compression Methodsmentioning
confidence: 99%
“…Post-training quantization (PTQ) enables the user to convert an already trained float model and quantize it without retraining [10,23,7,11]. However, it can also result in drastic reduction in model quality.…”
Section: Related Workmentioning
confidence: 99%