2022
DOI: 10.48550/arxiv.2203.14642
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

SPIQ: Data-Free Per-Channel Static Input Quantization

Abstract: Computationally expensive neural networks are ubiquitous in computer vision and solutions for efficient inference have drawn a growing attention in the machine learning community. Examples of such solutions comprise quantization, i.e. converting the processing values (weights and inputs) from floating point into integers e.g. int8 or int4. Concurrently, the rise of privacy concerns motivated the study of less invasive acceleration methods, such as data-free quantization of pre-trained models weights and activa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 30 publications
(59 reference statements)
0
4
0
Order By: Relevance
“…In Table 3, we report our extensive study of post-training W4/A4 quantization techniques on convolutional neural networks (ResNets, MobileNets and EfficientNets) as well as transformers from ViT b16 (86M parameters) to ViT h14 (600M parameters). In this extreme compression regime, we observe the limits of previous state-of-the art methods SQuant [8] and SPIQ [45]. This is not the case for Pow-erQuant which already achieves strong results on ResNets and transformers and, as such, offers a very strong baseline for the proposed NUPES method.…”
Section: Main Result: Comparison To Other Gptq Methodsmentioning
confidence: 74%
See 3 more Smart Citations
“…In Table 3, we report our extensive study of post-training W4/A4 quantization techniques on convolutional neural networks (ResNets, MobileNets and EfficientNets) as well as transformers from ViT b16 (86M parameters) to ViT h14 (600M parameters). In this extreme compression regime, we observe the limits of previous state-of-the art methods SQuant [8] and SPIQ [45]. This is not the case for Pow-erQuant which already achieves strong results on ResNets and transformers and, as such, offers a very strong baseline for the proposed NUPES method.…”
Section: Main Result: Comparison To Other Gptq Methodsmentioning
confidence: 74%
“…In Table 5, we report our results for several large language models on common sense reasoning tasks. We do not use group-wise quantization as it leads to incompatibility with activation quantization due to the constraint of dimensionality as explained in SPIQ [45]. In other words, while we can demonstrate that group-wise quantization can lead to higher compression rate for the weights, such methods are bound to never quantize the activations.…”
Section: Quantization At All Sizes: Handling Outliersmentioning
confidence: 99%
See 2 more Smart Citations