ResNet Can Be Pruned 60×: Introducing Network Purification and Unused Path Removal (P-RM) after Weight Pruning

Ma, Xiaolong; Yuan, Geng; Lin, Sheng; Li, Zhengang; Sun, Hao; Wang, Yanzhi

doi:10.1109/nanoarch47378.2019.181304

Cited by 33 publications

(22 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The key novelty is the fragment polarization technique that enforces the same sign for weights in each fragment. As recent works [43][44][45][46] have demonstrated, the structured pruning and quantization are two essential steps for hardware-friendly model compression that are universally applicable to all DNN accelerators. Thus, we perform structured pruning before fragment polarization considering the size of the ReRAM crossbars, and quantization after.…”

Section: Motivationmentioning

confidence: 99%

FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator

Yuan

Behnam

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Recent work demonstrated the promise of using resistive random access memory (ReRAM) as an emerging technology to perform inherently parallel analog domain in-situ matrix-vector multiplication-the intensive and key computation in deep neural networks (DNNs). One key problem is the weights that are signed values. However, in a ReRAM crossbar, weights are stored as conductance of the crossbar cells, and the in-situ computation assumes all cells on each crossbar column are of the same sign. The current architectures either use two ReRAM crossbars for positive and negative weights (PRIME), or add an offset to weights so that all values become positive (ISAAC). Neither solution is ideal: they either double the cost of crossbars, or incur extra offset circuity. To better address this problem, we propose FORMS, a fine-grained ReRAM-based DNN accelerator with algorithm/hardware co-design. Instead of trying to represent the positive/negative weights, our key design principle is to enforce exactly what is assumed in the in-situ computationensuring that all weights in the same column of a crossbar have the same sign. It naturally avoids the cost of an additional crossbar. Such polarized weights can be nicely generated using alternating direction method of multipliers (ADMM) regularized optimization during the DNN training, which can exactly enforce certain patterns in DNN weights. To achieve high accuracy, we divide the crossbar into logical sub-arrays and only enforce this property within the fine-grained sub-array columns. Crucially, the small sub-arrays provides a unique opportunity for input zeroskipping, which can significantly avoid unnecessary computations and reduce computation time. At the same time, it also makes the hardware much easier to implement and is less susceptible to nonidealities and noise than coarse-grained architectures. Putting all together, with the same optimized DNN models, FORMS achieves 1.50× and 1.93× throughput improvement in terms of GOP s s×mm 2 and GOP s W compared to ISAAC, and 1.12× ∼ 2.4× speed up in terms of frame per second over optimized ISAAC with almost the same power/area cost. Interestingly, FORMS optimization framework can even speed up the original ISAAC from 10.7× up to 377.9×, reflecting the importance of software/hardware co-design optimizations.

show abstract

Section: Motivationmentioning

confidence: 99%

FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator

Yuan

Behnam

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…To run such models on MCUs requires extreme minimization of the model's size and computation requirements with minimum loss of accuracy. Recent studies have applied model compression techniques such as pruning connections and neurons from fully connected neural networks (FCNNs) [17], filter or channel pruning from convolutional neural networks (CNNs) [18]- [20], knowledge distillation [21] and low-precision quantization [17], [22]. VOLUME 4, 2016 The most recent advancement in extreme downsizing of DL models and their computation requirements is XNOR-Net [23], where a model's activations and inputs are fully binarized.…”

Section: Introductionmentioning

confidence: 99%

Pruning vs XNOR-Net: A Comprehensive Study of Deep Learning for Audio Classification on Edge-Devices

2022

View full text Add to dashboard Cite

Deep learning has celebrated resounding successes in many application areas of relevance to the Internet of Things (IoT), such as computer vision and machine listening. These technologies must ultimately be brought directly to the edge to fully harness the power of deep leaning for the IoT. The obvious challenge is that deep learning techniques can only be implemented on strictly resource-constrained edge devices if the models are radically downsized. This task relies on different model compression techniques, such as network pruning, quantization, and the recent advancement of XNOR-Net. This study examines the suitability of these techniques for audio classification on microcontrollers. We present an application of XNOR-Net for end-to-end raw audio classification and a comprehensive empirical study comparing this approach with pruning-and-quantization methods. We show that raw audio classification with XNOR yields comparable performance to regular full precision networks for small numbers of classes while reducing memory requirements 32-fold and computation requirements 58-fold. However, as the number of classes increases significantly, performance degrades, and pruning-and-quantization based compression techniques take over as the preferred technique being able to satisfy the same space constraints but requiring approximately 8x more computation. We show that these insights are consistent between raw audio classification and image classification using standard benchmark sets. To the best of our knowledge, this is the first study to apply XNOR to end-to-end audio classification and evaluate it in the context of alternative techniques. All codes are publicly available on GitHub.

show abstract

“…From the pruning algorithm aspect, heuristic-based pruning was first proposed in [23] and gets improvements with more sophisticated designed heuristics [19,27,36,49,74,87]. Regularizationbased pruning [21,26,39,41,43,55,56,62,69,76,77,81], on the other hand, are more mathematicsoriented. Recent works [39,51,62,81,82] achieve substantial weight reduction without hurting the accuracy by leveraging Alternating Direction Methods of Multipliers (ADMM) with dynamic regularization penalties, but these methods require the manual setting of the compression rate for each layer.…”

Section: Introductionmentioning

confidence: 99%

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

Gong

Yuan

Zhan

et al. 2022

ACM Trans. Des. Autom. Electron. Syst.

Self Cite

View full text Add to dashboard Cite

Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices. However, prior pruning schemes have limited application scenarios due to accuracy degradation, difficulty in leveraging hardware acceleration, and/or restriction on certain types of DNN layers. In this paper, we propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations that are applicable to any type of DNN layer while achieving high accuracy and hardware inference performance. With the flexibility of applying different pruning schemes to different layers enabled by our compiler optimizations, we further probe into the new problem of determining the best-suited pruning scheme considering the different acceleration and accuracy performance of various pruning schemes. Two pruning scheme mapping methods, one is search-based and the other is rule-based, are proposed to automatically derive the best-suited pruning regularity and block size for each layer of any given DNN. Experimental results demonstrate that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework with up to 2.48 × and 1.73 × DNN inference acceleration on CIFAR-10 and ImageNet dataset without accuracy loss.

show abstract

ResNet Can Be Pruned 60×: Introducing Network Purification and Unused Path Removal (P-RM) after Weight Pruning

Cited by 33 publications

References 12 publications

FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator

FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator

Pruning vs XNOR-Net: A Comprehensive Study of Deep Learning for Audio Classification on Edge-Devices

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

Contact Info

Product

Resources

About