Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Syste 2020
DOI: 10.1145/3373376.3378534
|View full text |Cite
|
Sign up to set email alerts
|

PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning

Abstract: With the emergence of a spectrum of high-end mobile devices, many applications that formerly required desktop-level computation capability are being transferred to these devices. However, executing Deep Neural Networks (DNNs) inference is still challenging considering the high computation and storage demands, specifically, if real-time performance with high accuracy is needed. Weight pruning of DNNs is proposed, but existing schemes represent two extremes in the design space: non-structured pruning is fine-gra… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
113
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 182 publications
(129 citation statements)
references
References 51 publications
0
113
0
Order By: Relevance
“…Especially, deep learning models in embedded devices such as mobile or IoT devices require efficient processing. The examples are the face recognition model on a single-board computer [44], real-time DNN model in mobile devices [45], and emotion recognition in Rasberry Pi [46]. As a result, the proposed strategy, i.e., pre-processing for excluding unnecessary parts with a negligible cost, which incur significant overhead in the deep learning process, can be adapted and investigated to the deep-learning based methods for the other problems.…”
Section: Discussionmentioning
confidence: 99%
“…Especially, deep learning models in embedded devices such as mobile or IoT devices require efficient processing. The examples are the face recognition model on a single-board computer [44], real-time DNN model in mobile devices [45], and emotion recognition in Rasberry Pi [46]. As a result, the proposed strategy, i.e., pre-processing for excluding unnecessary parts with a negligible cost, which incur significant overhead in the deep learning process, can be adapted and investigated to the deep-learning based methods for the other problems.…”
Section: Discussionmentioning
confidence: 99%
“…Fined-Grained Pattern-Based Pruning. The state-of-the-art pruning work [47] proposes a fine-grained pattern-based pruning scheme, which generates an intermediate sparsity type between non-structured pruning and structured pruning. They prune a fixed number of weights in each convolution kernel (e.g., pruning 5 weights out of 9 weights in a 3×3 convolution kernel), and make the remaining weights to be concentrated in a certain area to form specific kernel patterns (called pattern sparsity), as shown in Figure 1 (left).…”
Section: Background 21 Dnn Model Pruningmentioning
confidence: 99%
“…Recent works [38,47] have applied pattern-based pruning techniques for improving inference efficiency. However, these inferencefocused strategies will pose several challenges to reach our three optimization objectives.…”
Section: Challenges Of Pattern-based Pruning In Trainingmentioning
confidence: 99%
“…Given an unpruned CNN model, our system first performs non-structured weight pruning with the Alternating Direction Method of Multipliers (ADMM) algorithm. Previous works have shown that ADMM-based algorithms can achieve the state-of-the-art compression ratio for CNNs with little accuracy loss [33,46]. Readers can refer to [46] for more details.…”
Section: Performance Challenges With Cnn Pruningmentioning
confidence: 99%
“…Another approach is to design more hardware-amenable pruning strategies [8,29]. For example, a hybrid strategy by combining structured and non-structured pruning can achieve good accuracy while maintaining some regular patterns in the pruned model for efficient hardware processing [29,33]. These works, however, lack a careful examination of the code optimization opportunities, resulting in restricted pruning choices and sub-optimal performance.…”
mentioning
confidence: 99%