Li Lyna Zhang scite author profile

With the recent trend of on-device deep learning, inference latency has become a crucial metric in running Deep Neural Network (DNN) models on various mobile and edge devices. To this end, latency prediction of DNN model inference is highly desirable for many tasks where measuring the latency on real devices is infeasible or too costly, such as searching for efficient DNN models with latency constraints from a huge model-design space. Yet it is very challenging and existing approaches fail to achieve a high accuracy of prediction, due to the varying model-inference latency caused by the runtime optimizations on diverse edge devices.In this paper, we propose and develop nn-Meter, a novel and efficient system to accurately predict the inference latency of DNN models on diverse edge devices. The key idea of nn-Meter is dividing a whole model inference into kernels, i.e., the execution units on a device, and conducting kernel-level prediction. nn-Meter builds atop two key techniques: (i) kernel detection to automatically detect the execution unit of model inference via a set of well-designed test cases; and (ii) adaptive sampling to efficiently sample the most beneficial configurations from a large space to build accurate kernellevel latency predictors. Implemented on three popular platforms of edge hardware (mobile CPU, mobile GPU, and Intel VPU) and evaluated using a large dataset of 26,000 models, nn-Meter significantly outperforms the prior state-of-the-art. CCS CONCEPTS• Computer systems organization → Neural networks; Embedded systems.

show abstract

SwiftPruner: Reinforced Evolutionary Pruning for Efficient Ad Relevance

Zhang

Homma

Wang

et al. 2022

View full text Add to dashboard Cite

Accurate and Structured Pruning for Efficient Automatic Speech Recognition

Jiang¹,

Zhang²,

Li³

et al. 2023

View full text Add to dashboard Cite

Fast Hardware-Aware Neural Architecture Search

Zhang

Yang

Jiang

et al. 2019

Preprint

View full text Add to dashboard Cite

nn-METER

Zhang

Han

Wei

et al. 2022

GetMobile: Mobile Comp. and Comm.

View full text Add to dashboard Cite

Inference latency has become a crucial metric in running Deep Neural Network (DNN) models on various mobile and edge devices. To this end, latency prediction of DNN inference is highly desirable for many tasks where measuring the latency on real devices is infeasible or too costly. Yet it is very challenging and existing approaches fail to achieve a high accuracy of prediction, due to the varying model-inference latency caused by the runtime optimizations on diverse edge devices. In this paper, we propose and develop nn-Meter, a novel and efficient system to accurately predict the DNN inference latency on diverse edge devices. The key idea of nn-Meter is dividing a whole model inference into kernels, i.e., the execution units on a device, and conducting kernel-level prediction. nn-Meter builds atop two key techniques: (i) kernel detection to automatically detect the execution unit of model inference via a set of well-designed test cases; and (ii) adaptive sampling to efficiently sample the most beneficial configurations from a large space to build accurate kernel-level latency predictors. nn-Meter achieves significant high prediction accuracy on four types of edge devices.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Li Lyna Zhang

Fast Hardware-Aware Neural Architecture Search

Characterizing Privacy Risks of Mobile Apps with Sensitivity Analysis

Systematically testing background services of mobile apps

nn-Meter

SwiftPruner: Reinforced Evolutionary Pruning for Efficient Ad Relevance

Accurate and Structured Pruning for Efficient Automatic Speech Recognition

Fast Hardware-Aware Neural Architecture Search

nn-METER

Contact Info

Product

Resources

About