2020
DOI: 10.1145/3371154
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing Deep Learning Inference on Embedded Systems Through Adaptive Model Selection

Abstract: Deep neural networks (DNNs) are becoming a key enabling technique for many application domains. However, on-device inference on battery-powered, resource-constrained embedding systems is often infeasible due to prohibitively long inferencing time and resource requirements of many DNNs. Oloading computation into the cloud is often unacceptable due to privacy concerns, high latency, or the lack of connectivity. While compression algorithms often succeed in reducing inferencing times, they come at the cost of red… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
21
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 51 publications
(22 citation statements)
references
References 50 publications
0
21
0
Order By: Relevance
“…where: 𝑙is the function; 𝛾 ∈ (0,1) is the factor and 𝜏 ∈ [0,1)is the threshold. Since the operation of lowering dropout probability with the predefined factor 𝛾 is differentiable, we can still optimize the opponent and the network-optimizer through (8) and (9). The compression process will stop when the percentage of left number of parameters in 𝐹 𝑊 (𝑥|𝑧) is smaller than a user-defined value 𝛼 ∈ (0,1).…”
Section: Network Compressing Routinementioning
confidence: 99%
See 1 more Smart Citation
“…where: 𝑙is the function; 𝛾 ∈ (0,1) is the factor and 𝜏 ∈ [0,1)is the threshold. Since the operation of lowering dropout probability with the predefined factor 𝛾 is differentiable, we can still optimize the opponent and the network-optimizer through (8) and (9). The compression process will stop when the percentage of left number of parameters in 𝐹 𝑊 (𝑥|𝑧) is smaller than a user-defined value 𝛼 ∈ (0,1).…”
Section: Network Compressing Routinementioning
confidence: 99%
“… memory capacity: neural networks achieve a high performance when using large number of neurons, which in turn requires large memory consumption to hold and process the model [8,9], [10]. As a result, compression could lower the memory requirements.…”
Section: Introduction Formulation Of the Problemmentioning
confidence: 99%
“…There are many studies showing it outperforms human-based approaches. Recent work shows that it is effective in performing parallel code optimization (Chen et al 2020;Cummins et al 2017a, b;Grewe et al 2013b;Ogilvie et al 2014;Wang et al 2014aWang et al , 2015, performance predicting (Wang and O'Boyle 2013;Zhao et al 2016), parallelism mapping (Grewe et al 2013a;Taylor et al 2017;Tournavitis et al 2009;Wang and O'Boyle 2010;Wang et al 2014bWang et al , 2015Wen et al 2014;Zhang et al 2020), and task scheduling (Emani et al 2013;Marco et al 2017;Ren et al 2017Ren et al , 2018Ren et al , 2020Sanz Marco et al 2019;Yuan et al 2019). As the many-core design becomes increasingly diverse, we believe that the machinelearning techniques provide a rigorous, automatic way for constructing optimization heuristics, which is more scalable and sustainable, compared to manually-crafted solutions.…”
Section: A Vision For the Next Decadementioning
confidence: 99%
“…While DNN models can be deployed on these inteligent edge platforms by specific runtime systems, which are usually closed-source or unmodifiable, the model compression techniques can be used to further optimize the inference performance. Besides, there are several studies on the adaptive inference for optimizing deep learning on embedded platforms, including adaptive strategies for neural network inference [22]- [25] and hardware/software co-design [26]- [28], which allow deep neural networks to be configurable and executed dynamically at runtime based on the resource constraints.…”
Section: Background and Related Workmentioning
confidence: 99%