Non-Structured DNN Weight Pruning—Is It Beneficial in Any Platform?

Ma, Xiaolong; Lin, Sheng; Ye, Shaokai; He, Zhezhi; Zhang, Linfeng; Yuan, Geng; Tan, Sia Huat; Li, Zhengang; Fan, Deliang; Qian, Xuehai; Lin, Xue; Ma, Kaisheng; Wang, Yanzhi

doi:10.1109/tnnls.2021.3063265

Cited by 41 publications

(16 citation statements)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, it cannot efectively and eiciently leverage the hardware parallelism provided by the underlying system. Consequently, unstructured pruning is generally not compatible with GPU acceleration for DNN inference, and speed degradation can often be observed [52].…”

Section: Background and Related Work 21 Dnn Pruning: Regularity And A...mentioning

confidence: 99%

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

Gong

Yuan

Zhan

et al. 2022

ACM Trans. Des. Autom. Electron. Syst.

Self Cite

View full text Add to dashboard Cite

Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices. However, prior pruning schemes have limited application scenarios due to accuracy degradation, difficulty in leveraging hardware acceleration, and/or restriction on certain types of DNN layers. In this paper, we propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations that are applicable to any type of DNN layer while achieving high accuracy and hardware inference performance. With the flexibility of applying different pruning schemes to different layers enabled by our compiler optimizations, we further probe into the new problem of determining the best-suited pruning scheme considering the different acceleration and accuracy performance of various pruning schemes. Two pruning scheme mapping methods, one is search-based and the other is rule-based, are proposed to automatically derive the best-suited pruning regularity and block size for each layer of any given DNN. Experimental results demonstrate that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework with up to 2.48 × and 1.73 × DNN inference acceleration on CIFAR-10 and ImageNet dataset without accuracy loss.

show abstract

Section: Background and Related Work 21 Dnn Pruning: Regularity And A...mentioning

confidence: 99%

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

Gong

Yuan

Zhan

et al. 2022

ACM Trans. Des. Autom. Electron. Syst.

Self Cite

View full text Add to dashboard Cite

show abstract

“…The majority of works in this direction apply a pretraining-pruning-retraining flow, which is not compatible with the trainingon-the-edge paradigm. According to the adopted sparsity scheme, those works can be categorized as unstructured [16,1], structured [24,2,25,26,17,3,27,28,29,30,31,18,32,33], and fine-grained structured [19,34,35,36,37,38,39,40,41] including the pattern-based and block-based ones. Detailed discussion about these sparsity schemes is provided in Appendix A.…”

Section: Sparsity Schemementioning

confidence: 99%

MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge

Yuan¹,

Ma²,

Niu³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Recently, a new trend of exploring sparsity for accelerating neural network training has emerged, embracing the paradigm of training on the edge. This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for accurate and fast execution on edge devices. The proposed MEST framework consists of enhancements by Elastic Mutation (EM) and Soft Memory Bound (&S) that ensure superior accuracy at high sparsity ratios. Different from the existing works for sparse training, this current work reveals the importance of sparsity schemes on the performance of sparse training in terms of accuracy as well as training speed on real edge devices. On top of that, the paper proposes to employ data efficiency for further acceleration of sparse training. Our results suggest that unforgettable examples can be identified in-situ even during the dynamic exploration of sparsity masks in the sparse training process, and therefore can be removed for further training speedup on edge devices. Comparing with state-of-the-art (SOTA) works on accuracy, our MEST increases Top-1 accuracy significantly on ImageNet when using the same unstructured sparsity scheme. Systematical evaluation on accuracy, training speed, and memory footprint are conducted, where the proposed MEST framework consistently outperforms representative SOTA works. Our codes are publicly available at: https://github.com/boone891214/MEST.Recently, a new trend of exploring sparsity for training acceleration of neural networks has emerged to embrace the promising training-on-the-edge paradigm. The first works in this direction use the pruning-at-initialization approach such as SNIP [9] and GraSP [10] that first obtains a fixed sparse model structure and then follows with a traditional training process. However, the whole process is still computation-and memory-intensive, and therefore not compatible with the end-to-end edge training paradigm. Such a sparse training methodology with the pre-fixed structure also faces the problem of compromised accuracy.

show abstract

“…The key novelty is the fragment polarization technique that enforces the same sign for weights in each fragment. As recent works [43][44][45][46] have demonstrated, the structured pruning and quantization are two essential steps for hardware-friendly model compression that are universally applicable to all DNN accelerators. Thus, we perform structured pruning before fragment polarization considering the size of the ReRAM crossbars, and quantization after.…”

Section: Motivationmentioning

confidence: 99%

FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator

Yuan

Behnam

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Recent work demonstrated the promise of using resistive random access memory (ReRAM) as an emerging technology to perform inherently parallel analog domain in-situ matrix-vector multiplication-the intensive and key computation in deep neural networks (DNNs). One key problem is the weights that are signed values. However, in a ReRAM crossbar, weights are stored as conductance of the crossbar cells, and the in-situ computation assumes all cells on each crossbar column are of the same sign. The current architectures either use two ReRAM crossbars for positive and negative weights (PRIME), or add an offset to weights so that all values become positive (ISAAC). Neither solution is ideal: they either double the cost of crossbars, or incur extra offset circuity. To better address this problem, we propose FORMS, a fine-grained ReRAM-based DNN accelerator with algorithm/hardware co-design. Instead of trying to represent the positive/negative weights, our key design principle is to enforce exactly what is assumed in the in-situ computationensuring that all weights in the same column of a crossbar have the same sign. It naturally avoids the cost of an additional crossbar. Such polarized weights can be nicely generated using alternating direction method of multipliers (ADMM) regularized optimization during the DNN training, which can exactly enforce certain patterns in DNN weights. To achieve high accuracy, we divide the crossbar into logical sub-arrays and only enforce this property within the fine-grained sub-array columns. Crucially, the small sub-arrays provides a unique opportunity for input zeroskipping, which can significantly avoid unnecessary computations and reduce computation time. At the same time, it also makes the hardware much easier to implement and is less susceptible to nonidealities and noise than coarse-grained architectures. Putting all together, with the same optimized DNN models, FORMS achieves 1.50× and 1.93× throughput improvement in terms of GOP s s×mm 2 and GOP s W compared to ISAAC, and 1.12× ∼ 2.4× speed up in terms of frame per second over optimized ISAAC with almost the same power/area cost. Interestingly, FORMS optimization framework can even speed up the original ISAAC from 10.7× up to 377.9×, reflecting the importance of software/hardware co-design optimizations.

show abstract

Non-Structured DNN Weight Pruning—Is It Beneficial in Any Platform?

Cited by 41 publications

References 46 publications

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge

FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator

Contact Info

Product

Resources

About