A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration Framework

Gong, Yifan; Zhan, Zheng; Li, Zhengang; Niu, Wei; Ma, Xiaolong; Wang, Wenhao; Ren, Bin; Ding, Caiwen; Lin, Xue; Xu, Xiaolin; Wang, Yanzhi

doi:10.1145/3386263.3407650

Cited by 20 publications

(9 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, these works directly apply fixed regularization terms that penalize all weights equally, incurring potential accuracy loss. Later works [21,62,81] adopt ADMM to reform the pruning problem as optimization problems with dynamic regularization penalties, thus preserving accuracy. One drawback of these methods is the requirement for the manual setting of the compression rate for each layer.…”

Section: Pruning Algorithmmentioning

confidence: 99%

“…From the pruning algorithm aspect, heuristic-based pruning was first proposed in [23] and gets improvements with more sophisticated designed heuristics [19,27,36,49,74,87]. Regularizationbased pruning [21,26,39,41,43,55,56,62,69,76,77,81], on the other hand, are more mathematicsoriented. Recent works [39,51,62,81,82] achieve substantial weight reduction without hurting the accuracy by leveraging Alternating Direction Methods of Multipliers (ADMM) with dynamic regularization penalties, but these methods require the manual setting of the compression rate for each layer.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

Gong

Yuan

Zhan

et al. 2022

ACM Trans. Des. Autom. Electron. Syst.

Self Cite

View full text Add to dashboard Cite

Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices. However, prior pruning schemes have limited application scenarios due to accuracy degradation, difficulty in leveraging hardware acceleration, and/or restriction on certain types of DNN layers. In this paper, we propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations that are applicable to any type of DNN layer while achieving high accuracy and hardware inference performance. With the flexibility of applying different pruning schemes to different layers enabled by our compiler optimizations, we further probe into the new problem of determining the best-suited pruning scheme considering the different acceleration and accuracy performance of various pruning schemes. Two pruning scheme mapping methods, one is search-based and the other is rule-based, are proposed to automatically derive the best-suited pruning regularity and block size for each layer of any given DNN. Experimental results demonstrate that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework with up to 2.48 × and 1.73 × DNN inference acceleration on CIFAR-10 and ImageNet dataset without accuracy loss.

show abstract

Section: Pruning Algorithmmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

Gong

Yuan

Zhan

et al. 2022

ACM Trans. Des. Autom. Electron. Syst.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Pruning [19,71,78,79,132,134,140,171,200,265,288] Quantization [19,68,90,134,166,179,291,307,311,314] Knowledge Distillation [29,41,42,80,83,88,95,170,186,195,220,228,231,239,257,266,267,274,295,296,300,312] Low rank factorization [76,98,119,168,190,196,210,292] Conditional Computation…”

Section: Model Compressionmentioning

confidence: 99%

“…However, the compressed VGG-16 model reduced the number of convolutional layers parameters by a factor of 41.4% for CIFAR-10 and 17.5% for CIFAR-100 dataset. In [71], the authors proposed a new framework based on weight pruning and compiler optimisation for faster inference while preserving the privacy of the training dataset. This approach initially trains the DNN model as usual on the user's own data.…”

Section: Model Compressionmentioning

confidence: 99%

Enabling Deep Learning for All-in EDGE paradigm

Joshi¹,

Hasanuzzaman²,

Thapa³

et al. 2022

Preprint

View full text Add to dashboard Cite

Deep Learning-based models have been widely investigated, and they have demonstrated significant performance on non-trivial tasks such as speech recognition, image processing, and natural language understanding. However, this is at the cost of substantial data requirements. Considering the widespread proliferation of edge devices (e.g., Internet of Things devices) over the last decade, Deep Learning in the edge paradigm, such as device-cloud integrated platforms, is required to leverage its superior performance. Moreover, it is suitable from the data requirements perspective in the edge paradigm because the proliferation of edge devices has resulted in an explosion in the volume of generated and collected data. However, there are difficulties due to other requirements such as high computation, high latency, and high bandwidth caused by Deep Learning applications in real-world scenarios. In this regard, this survey paper investigates Deep Learning at the edge, its architecture, enabling technologies, and model adaption techniques, where edge servers and edge devices participate in deep learning training and inference. For simplicity, we call this paradigm the All-in EDGE paradigm. Besides, this paper presents the key performance metrics for Deep Learning at the All-in EDGE paradigm to evaluate various deep learning techniques and choose a suitable design. Moreover, various open challenges arising from the deployment of Deep Learning at the All-in EDGE paradigm are identified and discussed.

show abstract

“…Pruning can also be adopted to reduce the model size, which determines the per-layer pruning ratio and pruning positions. With the assumption that weights with smaller magnitudes are less important for final accuracy, magnitude-based pruning [30,58,31,86,72,26,47,57,84] is widely employed to prune weights smaller than a threshold. However, the assumption is not necessarily true, and weight magnitudes can be misleading.…”

Section: Motivation and Challengesmentioning

confidence: 99%

Compiler-Aware Neural Architecture Search for On-Mobile Real-time Super-Resolution

Wu¹,

Gong²,

Zhao³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Deep learning-based super-resolution (SR) has gained tremendous popularity in recent years because of its high image quality performance and wide application scenarios. However, prior methods typically suffer from large amounts of computations and huge power consumption, causing difficulties for real-time inference, especially on resourcelimited platforms such as mobile devices. To mitigate this, we propose a compiler-aware SR neural architecture search (NAS) framework that conducts depth search and per-layer width search with adaptive SR blocks. The inference speed is directly taken into the optimization along with the SR loss to derive SR models with high image quality while satisfying the real-time inference requirement. Instead of measuring the speed on mobile devices at each iteration during the search process, a speed model incorporated with compiler optimizations is leveraged to predict the inference latency of the SR block with various width configurations for faster convergence. With the proposed framework, we achieve realtime SR inference for implementing 720p resolution with competitive SR performance (in terms of PSNR and SSIM) on GPU/DSP of mobile platforms (Samsung Galaxy S21). Codes are available at link.

show abstract

A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration Framework

Cited by 20 publications

References 15 publications

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

Enabling Deep Learning for All-in EDGE paradigm

Compiler-Aware Neural Architecture Search for On-Mobile Real-time Super-Resolution

Contact Info

Product

Resources

About