Overcoming Oscillations in Quantization-Aware Training

Nagel, Markus; Fournarakis, Marios; Yelysei, Bondarenko,; Blankevoort, Tijmen

doi:10.48550/arxiv.2203.11086

Cited by 3 publications

(4 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…By utilizing fewer bits to represent data, such as 16-bit floats or 8-bit integers instead of 32-bit floating-point numbers, quantization enables more compact model representations and the utilization of efficient vectorized operations on various hardware platforms [69]. This technique is particularly beneficial during inference, significantly reducing computation costs while maintaining inference accuracy [67] QAT involves quantizing a pre-trained model and subsequently performing a fine-tuning step to recover any accuracy loss caused by quantization-related errors, which may impact model performance [74]. The QAT process consists of two stages: pre-training and fine-tuning.…”

Section: ) Quantizationmentioning

confidence: 99%

A Machine Learning-Oriented Survey on Tiny Machine Learning

Capogrosso,

Cunico,

Cheng

et al. 2024

IEEE Access

View full text Add to dashboard Cite

The emergence of Tiny Machine Learning (TinyML) has positively revolutionized the field of Artificial Intelligence by promoting the joint design of resource-constrained IoT hardware devices and their learning-based software architectures. TinyML carries an essential role within the fourth and fifth industrial revolutions in helping societies, economies, and individuals employ effective AI-infused computing technologies (e.g., smart cities, automotive, and medical robotics). Given its multidisciplinary nature, the field of TinyML has been approached from many different angles: this comprehensive survey wishes to provide an up-to-date overview focused on all the learning algorithms within TinyML-based solutions. The survey is based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) methodological flow, allowing for a systematic and complete literature survey. In particular, firstly, we will examine the three different workflows for implementing a TinyML-based system, i.e., MLoriented, HW-oriented, and co-design. Secondly, we propose a taxonomy that covers the learning panorama under the TinyML lens, examining in detail the different families of model optimization and design, as well as the state-of-the-art learning techniques. Thirdly, this survey will present the distinct features of hardware devices and software tools that represent the current state-of-the-art for TinyML intelligent edge applications. Finally, we discuss the challenges and future directions.

show abstract

Section: ) Quantizationmentioning

confidence: 99%

A Machine Learning-Oriented Survey on Tiny Machine Learning

Capogrosso,

Cunico,

Cheng

et al. 2024

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Unlike quantization-aware training (QAT), labeled data or high computational power are not required for PTQ. However, this benefit often comes at the expense of non-trivial accuracy degradation, especially when using low-precision quantization [6][7][8]. Several methods have been proposed to address the challenges of PTQ [9,10].…”

Section: Related Work 21 Post-training Quantizationmentioning

confidence: 99%

SANA: Sensitivity-Aware Neural Architecture Adaptation for Uniform Quantization

Guo,

Dong,

Keutzer

2023

Applied Sciences

View full text Add to dashboard Cite

Uniform quantization is widely taken as an efficient compression method in practical applications. Despite its merit of having a low computational overhead, uniform quantization fails to preserve sensitive components in neural networks when applied with ultra-low bit precision, which could lead to a non-trivial accuracy degradation. Previous works have applied mixed-precision quantization to address this problem. However, finding the correct bit settings for different layers always demands significant time and resource consumption. Moreover, mixed-precision quantization is not well supported on current general-purpose machines such as GPUs and CPUs and, thus, will cause intolerable overheads in deployment. To leverage the efficiency of uniform quantization while maintaining accuracy, in this paper, we propose sensitivity-aware network adaptation (SANA), which automatically modifies the model architecture based on sensitivity analysis to make it more compatible with uniform quantization. Furthermore, we formulated four different channel initialization strategies to accelerate the quantization-aware fine-tuning process of SANA. Our experimental results showed that SANA can outperform standard uniform quantization and other state-of-the-art quantization methods in terms of accuracy, with comparable or even smaller memory consumption. Notably, ResNet-50-SANA (24.4 MB) with W4A8 quantization achieved 77.8% top-one accuracy on ImageNet, which even surpassed the 77.6% of the full-precision ResNet-50 (97.8 MB) baseline.

show abstract

“…This denotes the circumstance where the latent weights fluctuate around the boundary of adjacent quantization bins during quantizationaware training. As per our understanding, (Nagel et al, 2022) is the sole work probing into these effects, however, it restricts its scope to CNNs and their impact on batch normalization, a technique not employed in ViTs. We take the initiative to identify and analyze this oscillation phenomenon specific to ViTs.…”

Section: Oscillation In Trainingmentioning

confidence: 99%

“…The final optimization target is L = L KD + λL OBR , where λ is the weighting coefficient to balance between L KD and L OBR . To make sure that the regularization does not influence the learning of scale factors at the very early stage of training, we gradually increase the coefficient λ during training by applying a cosine annealing schedule following (Nagel et al, 2022).…”

Section: Oscillation-aware Bin Regularizationmentioning

confidence: 99%

Study on the Legal Mechanism of Cross-border Ecological Protection and Supervision in Guangdong-Hong Kong-Macau Greater Bay Area

HUANG¹,

Wu²,

CHENG³

2020

Journal of Macau University of Science and Technology

View full text Add to dashboard Cite

Guangdong-Hong Kong-Macau Greater Bay Area is a world-class urban agglomeration currently under construction in China, and is an important support for the future development of the "Belt and Road". With the rapid development of the economy in Dawan District, excessive development and utilization have caused serious environmental pollution to adversely affect the sustainable development of Dawan District. Guangdong, Hong Kong and Macau are facing cross-border ecological and environmental problems and challenges, and effective ecological environmental protection construction is urgent. The geographical and ecological interdependence of Guangdong, Hong Kong and Macao determines the necessity of creating a cross-border ecological protection and regulatory legal mechanism in Dawan District, which is also determined by the needs of the ecological management capacity and modernization of the Dawan District.

show abstract

Overcoming Oscillations in Quantization-Aware Training

Cited by 3 publications

References 13 publications

A Machine Learning-Oriented Survey on Tiny Machine Learning

A Machine Learning-Oriented Survey on Tiny Machine Learning

SANA: Sensitivity-Aware Neural Architecture Adaptation for Uniform Quantization

Study on the Legal Mechanism of Cross-border Ecological Protection and Supervision in Guangdong-Hong Kong-Macau Greater Bay Area

Contact Info

Product

Resources

About