Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization

Niu, Wei; Zhao, Pu; Zhan, Zheng; Lin, Xue; Wang, Yanzhi; Ren, Bin

doi:10.24963/ijcai.2020/778

Cited by 5 publications

(3 citation statements)

References 4 publications

(7 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Alternatively, there are pre-trained models offered by embedded devices-oriented frameworks, e.g., TensorFlow Lite 1 , which optimize models by quantization, pruning, etc. [10], [11]. However, these optimizations are at the expense of accuracy [12].…”

Section: Hitdl: High-throughput Deep Learningmentioning

confidence: 99%

HiTDL: High-Throughput Deep Learning Inference at the Hybrid Mobile Edge

Wang

Pei

et al. 2022

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal ? Take down policyIf you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

show abstract

Section: Hitdl: High-throughput Deep Learningmentioning

confidence: 99%

HiTDL: High-Throughput Deep Learning Inference at the Hybrid Mobile Edge

Wang

Pei

et al. 2022

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

show abstract

“…Over the recent years, demands to improve the performance of deep neural network (DNNs) have never been satisfied. Prior work approaches faster and more efficient DNNs from different aspects, such as model pruning [28,29,31], kernel factorization [3,14,40], and data quantization [45,50]. Among those efforts, quantizationbased DNN acceleration [45,46,50] finds its strengths in minimum modification of the original model architecture, lower memory consumption, and better runtime performance.…”

Section: Introductionmentioning

confidence: 99%

APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores

Feng¹,

Wang²,

Geng³

et al. 2021

Preprint

View full text Add to dashboard Cite

Over the years, accelerating neural networks with quantization has been widely studied. Unfortunately, prior efforts with diverse precisions (e.g., 1-bit weights and 2-bit activations) are usually restricted by limited precision support on GPUs (e.g., int1 and int4). To break such restrictions, we introduce the first Arbitrary Precision Neural Network framework (APNN-TC) to fully exploit quantization benefits on Ampere GPU Tensor Cores. Specifically, APNN-TC first incorporates a novel emulation algorithm to support arbitrary short bit-width computation with int1 compute primitives and XOR/AND Boolean operations. Second, APNN-TC integrates arbitrary precision layer designs to efficiently map our emulation algorithm to Tensor Cores with novel batching strategies and specialized memory organization. Third, APNN-TC embodies a novel arbitrary precision NN design to minimize memory access across layers and further improve performance. Extensive evaluations show that APNN-TC can achieve significant speedup over CUTLASS kernels and various NN models, such as ResNet and VGG.

show abstract

“…Therefore, efficient facial landmark localization is critical yet challenging particularly from the perspective of practical applications. In general, model binarization [8,9] and model pruning [10,11] are often used to reduce model size, but this kind of methods may hurt the model generalization. To guarantee the performance, it is necessary to retrain the binarized or pruned model.…”

Section: Introductionmentioning

confidence: 99%

Robust and Efficient Facial Landmark Localization

Hu¹,

Wang

Tao³

et al. 2021

2021 IEEE International Conference on Multimedia &Amp; Expo Workshops (ICMEW)

View full text Add to dashboard Cite

Facial landmark localization is an essential component for facial recognition which is a vital technology for patients tracking in the situation of a global pandemic of COVID-19. Parameter amount and computation cost are two key factors affecting the application of facial landmark localization. In addition, the intrinsic imbalance of head pose in existing datasets has a great impact on the model generalization. To address these problems, this paper proposed a lightweight model LiteFace with multi-knowledge distillation and a pose-aware resampling strategy. Specifically, we proposed to extract multi-scale features and enhance the receptive filed of network by delicate model design to obtain an efficient architecture and improved the performance of the lightweight model by multi-knowledge distillation which combines knowledge distillation and mutual learning in a two-stage manner. To further improve the model performance and boost the generalization, we proposed a pose-aware resampling strategy that generates samples with different head poses and utilized the LaPa dataset to generate masked face images to increase data diversity. We conducted extensive ablation studies on the model design, multi-knowledge distillation and resampling strategy. The proposed method achieves 1.93% and 3.04% NME on the JD-landmark-mask val and test dataset respectively. Finally, LiteFace wins the third place in the 3rd Grand Challenge of 106-Point Facial Landmark Localization.

show abstract

Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization

Cited by 5 publications

References 4 publications

HiTDL: High-Throughput Deep Learning Inference at the Hybrid Mobile Edge

HiTDL: High-Throughput Deep Learning Inference at the Hybrid Mobile Edge

APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores

Robust and Efficient Facial Landmark Localization

Contact Info

Product

Resources

About