Information theory based pruning for CNN compression and its application to image classification and action recognition

Phan, Hai-Hong; Vu, Ngoc-Son

doi:10.1109/avss.2019.8909826

Cited by 4 publications

(5 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [55], quantization and pruning are used for compressing the model, consequently including the two main techniques applied in the proposed approach. In [52], they apply a reduction of parameters using covariance and correlation criteria to convolutional and fully connected layers. Many works just apply their methods to either convolutional or fully connected layers exclusively, so it is difficult to compare the method proposed with the same standards.…”

Section: Resultsmentioning

confidence: 99%

“…This way, they get the relationships of similarity among filters and only leave without pruning those closer to the centroid of the cluster. Also, in [52] they propose a criteria based on the covariance and correlation of filters in convolutional and fully connected layers and successfully compressed different state-of-the-art models. Even though nowadays most typical criteria chosen by researchers are still the absolute value of weights, l1 and l2 norms [53].…”

Section: Network Pruningmentioning

confidence: 99%

See 1 more Smart Citation

Accuracy-aware Structured Filter Pruning for Deep Neural Networks

Carballo

Lee

2020

2020 International Conference on Computational Science and Computational Intelligence (CSCI)

View full text Add to dashboard Cite

Deep neural networks (DNNs) have several technical issues on computational complexity, redundancy, and the parameter sizeespecially when applied in embedded devices. Among those issues, lots of parameters require high memory capacity which causes migration problem to embedded devices. Many pruning techniques are proposed to reduce the network size in deep neural networks, but there are still various issues that exist for applying pruning techniques to DNNs. In this paper, we propose a simple-yetefficient scheme, accuracy-aware structured pruning based on the characterization of each convolutional layer. We investigate the accuracy and compression rate of individual layer with a fixed pruning ratio and re-order the pruning priority depending on the accuracy of each layer. To achieve a further compression rate, we also add quantization to the linear layers. Our results show that the order of the layers pruned does affect the final accuracy of the deep neural network. Based on our experiments, the pruned AlexNet and VGG16 models' parameter size is compressed up to 47.28x and 35.21x with less than 1% accuracy drop with respect to the original model.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Network Pruningmentioning

confidence: 99%

Accuracy-aware Structured Filter Pruning for Deep Neural Networks

Carballo

Lee

2020

2020 International Conference on Computational Science and Computational Intelligence (CSCI)

View full text Add to dashboard Cite

show abstract

“…(4) A face mask detection machine learning architecture is developed. (5) Easily reproducible open-source benchmarking templates are delivered that only use publicly available vision libraries. It is important to note that for the first time such a high number of hardware platforms, frameworks, and IC/OD models have been benchmarked and compared, not only on model latency performance but the full video pipeline.…”

Section: (3) a Comparison Between Raspberry Pi 4 Intel Neuralmentioning

confidence: 99%

“…Deploying efficiently AI applications on edge devices poses various challenges like discussed in [4], specifically constraints around compute, memory, and power consumption. To tackle these, quantization and weight pruning [5] are two popular techniques that normally trade a slight reduction in accuracy for performance gains. In quantization, the neural network weights and/or the feature maps are expressed by using shorter data types, such as FP16, INT16, or INT8 instead of FP32 [6]; this leads to a lower memory footprint as well as to a lower latency as the computation cost is reduced and the SIMD instructions can be used to calculate more operations per instruction.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Anatomy of Deep Learning Image Classification and Object Detection on Commercial Edge Devices: A Case Study on Face Mask Detection

et al. 2022

View full text Add to dashboard Cite

Developing efficient on-the-edge Deep Learning (DL) applications is a challenging and non-trivial task, as first different DL models need to be explored with different trade-offs between accuracy and complexity, second, various optimization options, frameworks and libraries are available that need to be explored, third, a wide range of edge devices are available with different computation and memory constraints. As such, trade-offs arise among inference time, energy consumption, efficiency (throughput/watt) and value (throughput/dollar). To shed some light in this problem, a case study is delivered where seven Image Classification (IC) and six Object Detection (OD) State-of-The-Art (SOTA) DL models were used to detect face masks on the following commercial off-the-shelf edge devices: Raspberry PI 4, Intel Neural Compute Stick 2, Jetson Nano, Jetson Xavier NX, and i.MX 8M Plus. First, a full end-toend video pipeline face mask wearing detection architecture is developed. Then, the thirteen DL models were optimized, evaluated and compared on the edge devices, in terms of accuracy and inference time. To leverage the computational power of the edge devices, the models have been optimized, first, by using the SOTA optimization frameworks (TensorFlow Lite, OpenVINO, TensorRT, eIQ) and, second, by evaluating/comparing different optimization options, e.g., different levels of quantization. Note that the five edge devices are evaluated and compared too, in terms of inference time, value and efficiency. Last, we obtain insightful observations on which optimization frameworks, libraries and options to use and on how to select the right device depending on the target metric (inference time, efficiency and value). For example, we show that Jetson Xavier NX platform is the best in terms of latency and efficiency (FPS/Watt), while Jetson Nano is the best in terms of value (FPS/$).

show abstract