Optimizing Deep Learning Inference on Embedded Systems Through Adaptive Model Selection

Marco, Vicent Sanz; Taylor, Benjamin M.; Wang, Zheng; Elkhatib, Yehia

doi:10.1145/3371154

Cited by 51 publications

(22 citation statements)

References 50 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…where: 𝑙is the function; 𝛾 ∈ (0,1) is the factor and 𝜏 ∈ [0,1)is the threshold. Since the operation of lowering dropout probability with the predefined factor 𝛾 is differentiable, we can still optimize the opponent and the network-optimizer through (8) and (9). The compression process will stop when the percentage of left number of parameters in 𝐹 𝑊 (𝑥|𝑧) is smaller than a user-defined value 𝛼 ∈ (0,1).…”

Section: Network Compressing Routinementioning

confidence: 99%

See 1 more Smart Citation

Methodology of neural network compression for multi-sensor transducernetwork models based on edge computing principles

Lobachev¹,

Antoshchuk²,

Hodovychenko³

2021

HAIT

View full text Add to dashboard Cite

This paper focuses on the development of a methodology to compress neural networks thatis based on the mechanism of prun-ingthe hidden layer neurons. The aforementioned neural networks are created in order to process the data generated by numerous sensors present in a transducer network that would be employed in a smart building. The proposed methodology implements a single approach for the compression of both Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) that are used for the tasks of classification and regression. The main principle behind this method is based on the dropout mechanism, which is employed as a regulation mechanism for the neural networks. The idea behind the method proposed consists of selecting optimal exclusion probability of a hidden layer neuron, based on the redundancy of the said neuron. The novelty of this method is theusage of a custom compression network thatis based on an RNN, which allows us to determine the redundancy parameter not just in a sin-gle hidden layer, but across severallayers. The additional novelty aspect consists of an iterative optimization of the network-optimizer, to have continuous improvement of the redundancy parameter calculator of the input network. For the experimental evalu-ation of the proposed methodology, the task of image recognition with a low-resolution camera was chosen, the CIFAR10 dataset was used to emulate the scenario. The VGGNet Convolutional Neural Network, that contains convolutional and fully connected lay-ers, was used as the network under test for the purposes of this experiment. The following two methods were taken as the analogous state of the art, the MagBase method, which is based on the sparcification principle as well as the method which is based on rarefied representation by employing the approach of rarefied encoding SFAC. The results of the experiment demonstrated that the amount of parameters in the compressed model is only 2.56% of the original input model. This has allowed us to reduce the logical output time by 93.7% and energy consumption by 94.8%. The proposed method allows to effectively usingdeep neural networks in transducer networks that utilize the architecture of edge computing. This in turn allows the system to process the data in real time, reduce the energy consumption and logical output time as well as lower the memory and storage requirements of real-world applications.

show abstract

Section: Network Compressing Routinementioning

confidence: 99%

“… memory capacity: neural networks achieve a high performance when using large number of neurons, which in turn requires large memory consumption to hold and process the model [8,9], [10]. As a result, compression could lower the memory requirements.…”

Section: Introduction Formulation Of the Problemmentioning

confidence: 99%

Methodology of neural network compression for multi-sensor transducernetwork models based on edge computing principles

Lobachev¹,

Antoshchuk²,

Hodovychenko³

2021

HAIT

View full text Add to dashboard Cite

show abstract

“…There are many studies showing it outperforms human-based approaches. Recent work shows that it is effective in performing parallel code optimization (Chen et al 2020;Cummins et al 2017a, b;Grewe et al 2013b;Ogilvie et al 2014;Wang et al 2014aWang et al , 2015, performance predicting (Wang and O'Boyle 2013;Zhao et al 2016), parallelism mapping (Grewe et al 2013a;Taylor et al 2017;Tournavitis et al 2009;Wang and O'Boyle 2010;Wang et al 2014bWang et al , 2015Wen et al 2014;Zhang et al 2020), and task scheduling (Emani et al 2013;Marco et al 2017;Ren et al 2017Ren et al , 2018Ren et al , 2020Sanz Marco et al 2019;Yuan et al 2019). As the many-core design becomes increasingly diverse, we believe that the machinelearning techniques provide a rigorous, automatic way for constructing optimization heuristics, which is more scalable and sustainable, compared to manually-crafted solutions.…”

Section: A Vision For the Next Decadementioning

confidence: 99%

Parallel programming models for heterogeneous many-cores: a comprehensive survey

et al. 2020

Self Cite

View full text Add to dashboard Cite

Heterogeneous many-cores are now an integral part of modern computing systems ranging from embedding systems to supercomputers. While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential can only be unlocked if the application programs are suitably parallel and can be made to match the underlying heterogeneous platform. In this article, we provide a comprehensive survey for parallel programming models for heterogeneous many-core architectures and review the compiling techniques of improving programmability and portability. We examine various software optimization techniques for minimizing the communicating overhead between heterogeneous computing devices. We provide a road map for a wide variety of different research areas. We conclude with a discussion on open issues in the area and potential research directions. This article provides both an accessible introduction to the fast-moving area of heterogeneous programming and a detailed bibliography of its main achievements.

show abstract

“…While DNN models can be deployed on these inteligent edge platforms by specific runtime systems, which are usually closed-source or unmodifiable, the model compression techniques can be used to further optimize the inference performance. Besides, there are several studies on the adaptive inference for optimizing deep learning on embedded platforms, including adaptive strategies for neural network inference [22]- [25] and hardware/software co-design [26]- [28], which allow deep neural networks to be configurable and executed dynamically at runtime based on the resource constraints.…”

Section: Background and Related Workmentioning

confidence: 99%

Fusion-Catalyzed Pruning for Optimizing Deep Learning on Intelligent Edge Devices

Wang

et al. 2020

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

View full text Add to dashboard Cite

The increasing computational cost of deep neural network models limits the applicability of intelligent applications on resource-constrained edge devices. While a number of neural network pruning methods have been proposed to compress the models, prevailing approaches focus only on parametric operators (e.g., convolution), which may miss optimization opportunities. In this paper, we present a novel fusion-catalyzed pruning approach, called FUPRUNER, which simultaneously optimizes the parametric and non-parametric operators for accelerating neural networks. We introduce an aggressive fusion method to equivalently transform a model, which extends the optimization space of pruning and enables non-parametric operators to be pruned in a similar manner as parametric operators, and a dynamic filter pruning method is applied to decrease the computational cost of models while retaining the accuracy requirement. Moreover, FUPRUNER provides configurable optimization options for controlling fusion and pruning, allowing much more flexible performance-accuracy trade-offs to be made. Evaluation with state-of-the-art residual neural networks on five representative intelligent edge platforms, Jetson TX2, Jetson Nano, Edge TPU, NCS, and NCS2, demonstrates the effectiveness of our approach, which can accelerate the inference of models on CIFAR-10 and ImageNet datasets.

show abstract

Optimizing Deep Learning Inference on Embedded Systems Through Adaptive Model Selection

Cited by 51 publications

References 50 publications

Methodology of neural network compression for multi-sensor transducernetwork models based on edge computing principles

Methodology of neural network compression for multi-sensor transducernetwork models based on edge computing principles

Parallel programming models for heterogeneous many-cores: a comprehensive survey

Fusion-Catalyzed Pruning for Optimizing Deep Learning on Intelligent Edge Devices

Contact Info

Product

Resources

About