The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2022
DOI: 10.48550/arxiv.2203.11086
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Overcoming Oscillations in Quantization-Aware Training

Abstract: When training neural networks with simulated quantization, we observe that quantized weights can, rather unexpectedly, oscillate between two grid-points. The importance of this effect and its impact on quantization-aware training are not well-understood or investigated in literature. In this paper, we delve deeper into the phenomenon of weight oscillations and show that it can lead to a significant accuracy degradation due to wrongly estimated batch-normalization statistics during inference and increased noise… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 13 publications
0
4
0
Order By: Relevance
“…By utilizing fewer bits to represent data, such as 16-bit floats or 8-bit integers instead of 32-bit floating-point numbers, quantization enables more compact model representations and the utilization of efficient vectorized operations on various hardware platforms [69]. This technique is particularly beneficial during inference, significantly reducing computation costs while maintaining inference accuracy [67] QAT involves quantizing a pre-trained model and subsequently performing a fine-tuning step to recover any accuracy loss caused by quantization-related errors, which may impact model performance [74]. The QAT process consists of two stages: pre-training and fine-tuning.…”
Section: ) Quantizationmentioning
confidence: 99%
“…By utilizing fewer bits to represent data, such as 16-bit floats or 8-bit integers instead of 32-bit floating-point numbers, quantization enables more compact model representations and the utilization of efficient vectorized operations on various hardware platforms [69]. This technique is particularly beneficial during inference, significantly reducing computation costs while maintaining inference accuracy [67] QAT involves quantizing a pre-trained model and subsequently performing a fine-tuning step to recover any accuracy loss caused by quantization-related errors, which may impact model performance [74]. The QAT process consists of two stages: pre-training and fine-tuning.…”
Section: ) Quantizationmentioning
confidence: 99%
“…Unlike quantization-aware training (QAT), labeled data or high computational power are not required for PTQ. However, this benefit often comes at the expense of non-trivial accuracy degradation, especially when using low-precision quantization [6][7][8]. Several methods have been proposed to address the challenges of PTQ [9,10].…”
Section: Related Work 21 Post-training Quantizationmentioning
confidence: 99%
“…This denotes the circumstance where the latent weights fluctuate around the boundary of adjacent quantization bins during quantizationaware training. As per our understanding, (Nagel et al, 2022) is the sole work probing into these effects, however, it restricts its scope to CNNs and their impact on batch normalization, a technique not employed in ViTs. We take the initiative to identify and analyze this oscillation phenomenon specific to ViTs.…”
Section: Oscillation In Trainingmentioning
confidence: 99%
“…The final optimization target is L = L KD + λL OBR , where λ is the weighting coefficient to balance between L KD and L OBR . To make sure that the regularization does not influence the learning of scale factors at the very early stage of training, we gradually increase the coefficient λ during training by applying a cosine annealing schedule following (Nagel et al, 2022).…”
Section: Oscillation-aware Bin Regularizationmentioning
confidence: 99%