Pruning and quantization for deep neural network acceleration: A survey

Liang, Tailin; Glossner, John; Wang, Lei; Shi, Shaobo; Zhang, Xiaolin

doi:10.1016/j.neucom.2021.07.045

Cited by 438 publications

(197 citation statements)

References 75 publications

Supporting

Mentioning

135

Contrasting

Order By: Relevance

“…If performed correctly, a model rarely loses accuracy, and even in these cases, only a negligible percentage is lost. Most of the models keep their initial accuracies, and some of them show improvements [41,42].…”

Section: Tensorflow Lite Model Evaluationmentioning

confidence: 99%

Design Space Exploration of a Multi-Model AI-Based Indoor Localization System

Kotrotsios

Fanariotis

Leligou

et al. 2022

Sensors

View full text Add to dashboard Cite

In this paper, we present the results of a performance evaluation and optimization process of an indoor positioning system (IPS) designed to operate on portable as well as miniaturized embedded systems. The proposed method uses the Received Signal Strength Indicator (RSSI) values from multiple Bluetooth Low-Energy (BLE) beacons scattered around interior spaces. The beacon signals were received from the user devices and processed through an RSSI filter and a group of machine learning (ML) models, in an arrangement of one model per detected node. Finally, a multilateration problem was solved using as an input the inferred distances from the advertising nodes and returning the final position approximation. In this work, we first presented the evaluation of different ML models for inferring the distance between the devices and the installed beacons by applying different optimization algorithms. Then, we presented model reduction methods to implement the optimized algorithm on the embedded system by appropriately adapting it to its constraint resources and compared the results, demonstrating the efficiency of the proposed method.

show abstract

Section: Tensorflow Lite Model Evaluationmentioning

confidence: 99%

Design Space Exploration of a Multi-Model AI-Based Indoor Localization System

Kotrotsios

Fanariotis

Leligou

et al. 2022

Sensors

View full text Add to dashboard Cite

show abstract

“…Quantization and compilation are performed by the Vitis Ai tools, provided as command-line programs or Python modules, as previously, specific for a certain framework. The quantization process changes the internal representation of the model's parameters during the inference [29]. Usually, standard computational platforms perform AI-related calculations on floating point data types (with varying precision depending on if the model is used on CPUs or GPUs).…”

Section: Preparing Deep Models For Deploymentmentioning

confidence: 99%

“…The model preparation process may include some additional optimization steps, such as pruning the original deep architecture [30]. It may ultimately help us reduce the network's size by eliminating, e.g., redundant parameters and/or connections in the model, which have minimal impact on the overall accuracy of the algorithm [29].…”

Section: Preparing Deep Models For Deploymentmentioning

confidence: 99%

Benchmarking Deep Learning for On-Board Space Applications

Ziaja¹,

Bosowski²,

Myller³

et al. 2021

Remote Sensing

View full text Add to dashboard Cite

Benchmarking deep learning algorithms before deploying them in hardware-constrained execution environments, such as imaging satellites, is pivotal in real-life applications. Although a thorough and consistent benchmarking procedure can allow us to estimate the expected operational abilities of the underlying deep model, this topic remains under-researched. This paper tackles this issue and presents an end-to-end benchmarking approach for quantifying the abilities of deep learning algorithms in virtually any kind of on-board space applications. The experimental validation, performed over several state-of-the-art deep models and benchmark datasets, showed that different deep learning techniques may be effectively benchmarked using the standardized approach, which delivers quantifiable performance measures and is highly configurable. We believe that such benchmarking is crucial in delivering ready-to-use on-board artificial intelligence in emerging space applications and should become a standard tool in the deployment chain.

show abstract

“…Real-time inference on resource-constrained and efficiency-demanding platforms has long been desired and extensively studied in the last decades, resulting in significant improvement on the trade-off between efficiency and accuracy (Han et al, 2015;Mei et al, 2019;Tanaka et al, 2020;Ma et al, 2020;Mishra et al, 2020;Liang et al, 2021;Liu et al, 2021). As a model compression technique, quantization is promising compared to other methods, such as network pruning (Tanaka et al, 2020;Ma et al, 2020; and slimming (Liu et al, 2017;2018), as it achieves a large compression ratio (Krishnamoorthi, 2018;Nagel et al, 2021) and is computationally beneficial for integer-only hardware.…”

Section: Introductionmentioning

confidence: 99%

F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization

Jin¹,

Ren²,

Zhuang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Neural network quantization is a promising compression technique to reduce memory footprint and save energy consumption, potentially leading to real-time inference. However, there is a performance gap between quantized and fullprecision models. To reduce it, existing quantization approaches require highprecision INT32 or full-precision multiplication during inference for scaling or dequantization. This introduces a noticeable cost in terms of memory, speed, and required energy. To tackle these issues, we present F8Net, a novel quantization framework consisting of only fixed-point 8-bit multiplication. To derive our method, we first discuss the advantages of fixed-point multiplication with different formats of fixed-point numbers and study the statistical behavior of the associated fixedpoint numbers. Second, based on the statistical and algorithmic analysis, we apply different fixed-point formats for weights and activations of different layers. We introduce a novel algorithm to automatically determine the right format for each layer during training. Third, we analyze a previous quantization algorithmparameterized clipping activation (PACT)-and reformulate it using fixed-point arithmetic. Finally, we unify the recently proposed method for quantization finetuning and our fixed-point approach to show the potential of our method. We verify F8Net on ImageNet for MobileNet V1/V2 and ResNet18/50. Our approach achieves comparable and better performance, when compared not only to existing quantization techniques with INT32 multiplication or floating-point arithmetic, but also to the full-precision counterparts, achieving state-of-the-art performance.

show abstract

Pruning and quantization for deep neural network acceleration: A survey

Cited by 438 publications

References 75 publications

Design Space Exploration of a Multi-Model AI-Based Indoor Localization System

Design Space Exploration of a Multi-Model AI-Based Indoor Localization System

Benchmarking Deep Learning for On-Board Space Applications

F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization

Contact Info

Product

Resources

About