Dynamic Precision Analog Computing for Neural Networks

Garg, Sahaj; Lou, Joe; Jain, Anirudh; Nahmias, Mitchell A.

doi:10.48550/arxiv.2102.06365

Cited by 9 publications

(8 citation statements)

References 36 publications

(75 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…2) Post-Training Quantization: An alternative to the expensive QAT method is Post-Training Quantization (PTQ) which performs the quantization and the adjustments of the weights, without any fine-tuning [11,24,40,59,60,67,68,87,106,138,144,168,176,269]. As such, the overhead of PTQ is very low and often negligible.…”

Section: G Fine-tuning Methodsmentioning

confidence: 99%

A Survey of Quantization Methods for Efficient Neural Network Inference

Gholami¹,

Kim²,

Dong³

et al. 2022

Low-Power Computer Vision

394

143

View full text Add to dashboard Cite

As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.

show abstract

Section: G Fine-tuning Methodsmentioning

confidence: 99%

A Survey of Quantization Methods for Efficient Neural Network Inference

Gholami¹,

Kim²,

Dong³

et al. 2022

Low-Power Computer Vision

394

143

View full text Add to dashboard Cite

show abstract

“…Quantization is one of the most widely-used techniques for neural network compression (Courbariaux et al, 2015;Han et al, 2015;Zhu et al, 2016;Zhou et al, 2016;Mishra et al, 2017;Park et al, 2017;Banner et al, 2018), with two types of training strategies: Post-Training Quantization directly quantizes a pre-trained full-precision model (He & Cheng, 2018;Nagel et al, 2019;Fang et al, 2020a;b;Garg et al, 2021); Quantization-Aware Training uses training data to optimize quantized models for better performance (Gysel et al, 2018;Esser et al, 2019;Hubara et al, 2020;Tailor et al, 2020). In this work, we focus on the latter one, which is explored in several directions.…”

Section: Related Workmentioning

confidence: 99%

F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization

Jin¹,

Ren²,

Zhuang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Neural network quantization is a promising compression technique to reduce memory footprint and save energy consumption, potentially leading to real-time inference. However, there is a performance gap between quantized and fullprecision models. To reduce it, existing quantization approaches require highprecision INT32 or full-precision multiplication during inference for scaling or dequantization. This introduces a noticeable cost in terms of memory, speed, and required energy. To tackle these issues, we present F8Net, a novel quantization framework consisting of only fixed-point 8-bit multiplication. To derive our method, we first discuss the advantages of fixed-point multiplication with different formats of fixed-point numbers and study the statistical behavior of the associated fixedpoint numbers. Second, based on the statistical and algorithmic analysis, we apply different fixed-point formats for weights and activations of different layers. We introduce a novel algorithm to automatically determine the right format for each layer during training. Third, we analyze a previous quantization algorithmparameterized clipping activation (PACT)-and reformulate it using fixed-point arithmetic. Finally, we unify the recently proposed method for quantization finetuning and our fixed-point approach to show the potential of our method. We verify F8Net on ImageNet for MobileNet V1/V2 and ResNet18/50. Our approach achieves comparable and better performance, when compared not only to existing quantization techniques with INT32 multiplication or floating-point arithmetic, but also to the full-precision counterparts, achieving state-of-the-art performance.

show abstract

“…Dynamic analog precision has been proposed by Gonugondla et al [22] to adapt to the variation in noise sensitivity across different portions of network architectures. Garg et al [24] proposed averaging the results of multiple matrix multiplications to reduce the effect of analog device noise. Our approach departs from the aforementioned proposals.…”

Section: Related Workmentioning

confidence: 99%

Adaptive Block Floating-Point for Analog Deep Learning Hardware

Basumallik¹,

Bunandar²,

Dronen³

et al. 2022

Preprint

View full text Add to dashboard Cite

Analog mixed-signal (AMS) devices promise faster, more energy-efficient deep neural network (DNN) inference than their digital counterparts. However, recent studies show that DNNs on AMS devices with fixed-point numbers can incur an accuracy penalty because of precision loss. To mitigate this penalty, we present a novel AMS-compatible adaptive block floating-point (ABFP) number representation. We also introduce amplification (or gain) as a method for increasing the accuracy of the number representation without increasing the bit precision of the output. We evaluate the effectiveness of ABFP on the DNNs in the MLPerf™ datacenter inference benchmark-realizing less than 1% loss in accuracy compared to FLOAT32. We also propose a novel method of finetuning for AMS devices, Differential Noise Finetuning (DNF), which samples device noise to speed up finetuning compared to conventional Quantization-Aware Training.

show abstract

Dynamic Precision Analog Computing for Neural Networks

Cited by 9 publications

References 36 publications

A Survey of Quantization Methods for Efficient Neural Network Inference

A Survey of Quantization Methods for Efficient Neural Network Inference

F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization

Adaptive Block Floating-Point for Analog Deep Learning Hardware

Contact Info

Product

Resources

About