Accelerating Sparse CNN Inference on GPUs with Performance-Aware Weight Pruning

Rumi, Masuma Akter; Ma, Xiaolong; Wang, Yanzhi; Jiang, Peng

doi:10.1145/3410463.3414648

Cited by 19 publications

(16 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To combine the benefits of structural and unstructured pruning, hybrid pruning strategies have been introduced to pursue more general structural spares patterns which are also capable of acceleration. For example, convolution kernels with half regular sparsity or pattern-based structural sparsity (Ma et al, 2020) or vector-wise (Zhu et al, 2019) and group-wise (Rumi et al, 2020) regular sparsity.…”

Section: Related Workmentioning

confidence: 99%

“…In other words, the dense matrices in identified structural patterns have a restricted shape where one dimension must align with the kernel size n, i.e., the continued product of the number of input channels, channel height, and weight. Motivated by Rumi et al (2020), we introduce a regrouping strategy (Figure 2) to create more fine-grained group-wise structural patterns with flexible shapes for remaining dense matrices.…”

Section: Regrouping For Structural Patternsmentioning

confidence: 99%

“…As described in Algorithm 3, to achieve the goal, we first need to find similar rows and columns, and then bring them together. Specifically, We adopt the Jaccard similarity (Rumi et al, 2020;Jiang et al, 2020) among non-zero columns as the similarity between two rows in the sparse matrix, which is calculated as a cardinality ratio of the intersections to the union of non-zero columns. For instance, kernel 1 and kernel 2 in Figure 2 (upper left) share three columns in eight non-zero distinct columns, and their similarity is 3 8 .…”

Section: Regrouping For Structural Patternsmentioning

confidence: 99%

“…Then, the pair-wise similarity is leveraged to locate an optimal partitioning, which can be achieved with hMETIS 2 . More details are referred to Rumi et al (2020). After obtaining the desired dense blocks, we enable all their parameters to be trainable by refilling the corresponding pruned elements.…”

Section: Regrouping For Structural Patternsmentioning

confidence: 99%

“…In order to push the compression ratio higher, we introduce a regrouping algorithm based on hypergraph partitioning (Rumi et al, 2020) to establish group-wise structural patterns which are more amenable to pruning due to the shape flexibility of grouped dense blocks. These groupwise structural winning tickets achieve ∼ 60% running time savings at 50% ∼ 80% sparsity without any performance degradation compared to the dense models.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets

Chen¹,

Chen²,

Ma³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

The lottery ticket hypothesis (LTH) has shown that dense models contain highly sparse subnetworks (i.e., winning tickets) that can be trained in isolation to match full accuracy. Despite many exciting efforts being made, there is one "commonsense" seldomly challenged: a winning ticket is found by iterative magnitude pruning (IMP) and hence the resultant pruned subnetworks have only unstructured sparsity. That gap limits the appeal of winning tickets in practice, since the highly irregular sparse patterns are challenging to accelerate on hardware. Meanwhile, directly substituting structured pruning for unstructured pruning in IMP damages performance more severely and is usually unable to locate winning tickets.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Regrouping For Structural Patternsmentioning

confidence: 99%

Section: Regrouping For Structural Patternsmentioning

confidence: 99%

Section: Regrouping For Structural Patternsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets

Chen¹,

Chen²,

Ma³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning

Kong

Dong

et al. 2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

A Channel Pruning Optimization With Layer-Wise Sensitivity in a Single-Shot Manner Under Computational Constraints

et al. 2023

View full text Add to dashboard Cite

In the constrained computing environments such as mobile device or satellite on-board system, various computational factors of hardware resource can restrict the processing of deep learning (DL) services.Recent DL models such as satellite image analysis mainly require larger resource memory occupation for intermediate feature map footprint than the given memory specification of hardware resource and larger computational overhead (in FLOP) to meet service-level objective in the sense of hardware accelerator. As one of the solutions, we propose a new method of controlling the layer-wise channel pruning in a single-shot manner that can decide how much channels to prune in each layer by observing dataset once without full pretraining. To improve the robustness of the performance degradation, we also propose a layer-wise sensitivity and formulate the optimization problems for deciding layer-wise pruning ratio under target computational constraints. In the paper, the optimal conditions are theoretically derived, and the practical optimum searching schemes are proposed using the optimal conditions. On the empirical evaluation, the proposed methods show robustness on performance degradation, and present feasibility on DL serving under constrained computing environments by reducing memory occupation, providing acceleration effect and throughput improvement while keeping the accuracy performance.INDEX TERMS Single-shot pruning, channel pruning, lottery ticket hypothesis, DL model compression.

show abstract

Accelerating Sparse CNN Inference on GPUs with Performance-Aware Weight Pruning

Cited by 19 publications

References 29 publications

Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets

Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets

SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning

A Channel Pruning Optimization With Layer-Wise Sensitivity in a Single-Shot Manner Under Computational Constraints

Contact Info

Product

Resources

About