Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence 2020
DOI: 10.24963/ijcai.2020/778
|View full text |Cite
|
Sign up to set email alerts
|

Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization

Abstract: High-end mobile platforms rapidly serve as primary computing devices for a wide range of Deep Neural Network (DNN) applications. However, the constrained computation and storage resources on these devices still pose significant challenges for real-time DNN inference executions. To address this problem, we propose a set of hardware-friendly structured model pruning and compiler optimization techniques to accelerate DNN executions on mobile devices. This demo shows that these optimizations can enable rea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 4 publications
(7 reference statements)
0
3
0
Order By: Relevance
“…Alternatively, there are pre-trained models offered by embedded devices-oriented frameworks, e.g., TensorFlow Lite 1 , which optimize models by quantization, pruning, etc. [10], [11]. However, these optimizations are at the expense of accuracy [12].…”
Section: Hitdl: High-throughput Deep Learningmentioning
confidence: 99%
“…Alternatively, there are pre-trained models offered by embedded devices-oriented frameworks, e.g., TensorFlow Lite 1 , which optimize models by quantization, pruning, etc. [10], [11]. However, these optimizations are at the expense of accuracy [12].…”
Section: Hitdl: High-throughput Deep Learningmentioning
confidence: 99%
“…Over the recent years, demands to improve the performance of deep neural network (DNNs) have never been satisfied. Prior work approaches faster and more efficient DNNs from different aspects, such as model pruning [28,29,31], kernel factorization [3,14,40], and data quantization [45,50]. Among those efforts, quantizationbased DNN acceleration [45,46,50] finds its strengths in minimum modification of the original model architecture, lower memory consumption, and better runtime performance.…”
Section: Introductionmentioning
confidence: 99%
“…Therefore, efficient facial landmark localization is critical yet challenging particularly from the perspective of practical applications. In general, model binarization [8,9] and model pruning [10,11] are often used to reduce model size, but this kind of methods may hurt the model generalization. To guarantee the performance, it is necessary to retrain the binarized or pruned model.…”
Section: Introductionmentioning
confidence: 99%