Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques 2020
DOI: 10.1145/3410463.3414648
|View full text |Cite
|
Sign up to set email alerts
|

Accelerating Sparse CNN Inference on GPUs with Performance-Aware Weight Pruning

Abstract: Weight pruning is a popular technique to reduce the size and computation complexity of the Convolutional Neural Networks (CNNs). Despite its success in reducing the model size, weight pruning has brought limited benefit to the CNN inference performance, due to the irregularity introduced in the sparse convolution operations. In this work, we aim to improve the performance of sparse convolutions on GPUs by mitigating the irregularity. We find that the existing performance optimization techniques for sparse matr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
16
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 19 publications
(16 citation statements)
references
References 29 publications
0
16
0
Order By: Relevance
“…To combine the benefits of structural and unstructured pruning, hybrid pruning strategies have been introduced to pursue more general structural spares patterns which are also capable of acceleration. For example, convolution kernels with half regular sparsity or pattern-based structural sparsity (Ma et al, 2020) or vector-wise (Zhu et al, 2019) and group-wise (Rumi et al, 2020) regular sparsity.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…To combine the benefits of structural and unstructured pruning, hybrid pruning strategies have been introduced to pursue more general structural spares patterns which are also capable of acceleration. For example, convolution kernels with half regular sparsity or pattern-based structural sparsity (Ma et al, 2020) or vector-wise (Zhu et al, 2019) and group-wise (Rumi et al, 2020) regular sparsity.…”
Section: Related Workmentioning
confidence: 99%
“…In other words, the dense matrices in identified structural patterns have a restricted shape where one dimension must align with the kernel size n, i.e., the continued product of the number of input channels, channel height, and weight. Motivated by Rumi et al (2020), we introduce a regrouping strategy (Figure 2) to create more fine-grained group-wise structural patterns with flexible shapes for remaining dense matrices.…”
Section: Regrouping For Structural Patternsmentioning
confidence: 99%
See 3 more Smart Citations