MAPLE: Microprocessor A Priori for Latency Estimation

Abbasi, Saad; Wong, Alexander; Shafiee, Mohammad Javad

doi:10.48550/arxiv.2111.15106

Cited by 2 publications

(11 citation statements)

References 32 publications

(59 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this section we evaluate and compare MAPLE-X with MAPLE [1] and HELP [4]. We adopt a one-device-leave-out approach and form a training pool of five devices listed in Section 3.1 and use the sixth device for testing.…”

Section: Resultsmentioning

confidence: 99%

“…To evaluate our methodology, we use the same dataset as MAPLE which we refer to as NASBench-X [1]. NASBench-X is based on NASBench-201, a cell-based convolutional neural network NAS dataset and includes a total of 15,625 architectures.…”

Section: Discussionmentioning

confidence: 99%

“…A recently introduced technique known as MAPLE [1] employs hardware performance counters to characterize how different devices behave under specific workloads. MAPLE subsequently measures a small number of samples (i.e., 3 to 10) from the target hardware and augments the training set prior to optimizing a hardware-aware regression model.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge

Abbasi¹,

Wong²,

Shafiee³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

Deep neural network (DNN) latency characterization is a time-consuming process and adds significant cost to Neural Architecture Search (NAS) processes when searching for efficient convolutional neural networks for embedded vision applications. DNN Latency is a hardware dependent metric and requires direct measurement or inference on target hardware. A recently introduced latency estimation technique known as MAPLE predicts DNN execution time on previously unseen hardware devices by using hardware performance counters. Leveraging these hardware counters in the form of an implicit prior, MAPLE achieves stateof-the-art performance in latency prediction. Here, we propose MAPLE-X which extends MAPLE by incorporating explicit prior knowledge of hardware devices and DNN architecture latency to better account for model stability and robustness. First, by identifying DNN architectures that exhibit a similar latency to each other, we can generate multiple virtual examples to significantly improve the accuracy over MAPLE. Secondly, the hardware specifications are used to determine the similarity between training and test hardware to emphasize training samples captured from comparable devices (domains) and encourages improved domain alignment. Experimental results using a convolution neural network NAS benchmark across different types of devices, including an Intel processor that is now used for embedded vision applications, demonstrate a 5% improvement over MAPLE and 9% over HELP. Furthermore, we include ablation studies to independently assess the benefits of virtual examples and hardware-based sample importance.Preprint. Under review.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge

Abbasi¹,

Wong²,

Shafiee³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…As an alternative to direct measurements, existing approaches for evaluating the efficiency of a neural architecture can be categorized as those using: (1) Proxy metrics [50,65] (e.g., FLOPs), which are usually platform-independent and cannot accurately reflect the actual performance due to the diversity of platforms [39,51]. (2) Lookup tables [7,11,56], which are collected for pre-defined building blocks in the search space, but cannot cover every possible configuration in a potentially huge search space and require comprehensive measurements on each platform. (3) Prediction models [2,8,12], which broadly rely on machine learning techniques (e.g., MLPs) and have the potential to predict the performance of any configuration in the search space.…”

Section: Introductionmentioning

confidence: 99%

“…(2) Lookup tables [7,11,56], which are collected for pre-defined building blocks in the search space, but cannot cover every possible configuration in a potentially huge search space and require comprehensive measurements on each platform. (3) Prediction models [2,8,12], which broadly rely on machine learning techniques (e.g., MLPs) and have the potential to predict the performance of any configuration in the search space. However, it is difficult to build accurate prediction models for efficiency metrics on mobile devices due to the following challenges.…”

Section: Introductionmentioning

confidence: 99%

Inference Latency Prediction at the Edge

Li¹,

Paolieri²,

Golubchik³

2022

Preprint

View full text Add to dashboard Cite

With the growing workload of inference tasks on mobile devices, state-of-the-art neural architectures (NAs) are typically designed through Neural Architecture Search (NAS) to identify NAs with good tradeoffs between accuracy and efficiency (e.g., latency). Since measuring latency of a huge set of candidate architectures during NAS is not scalable, approaches are needed for predicting end-to-end inference latency on mobile devices. Such predictions are challenging due to hardware heterogeneity, optimizations applied by ML frameworks, and diversity of neural architectures. Motivated by these challenges, in this paper, we first quantitatively assess characteristics of neural architectures and mobile devices that have significant effects on inference latency. Based on this assessment, we propose a latency prediction framework which addresses these challenges by developing operation-wise latency predictors, under a variety of settings and a number hardware devices, with multi-core CPUs and GPUs, achieving high accuracy in end-to-end latency prediction, as shown by our comprehensive evaluations. To illustrate that our approach does not require expensive data collection, we also show that accurate predictions can be achieved on real-world NAs using only small amounts of profiling data. CCS Concepts: • General and reference → Performance; Empirical studies; • Computing methodologies → Neural networks; • Human-centered computing → Mobile devices.

show abstract

MAPLE: Microprocessor A Priori for Latency Estimation

Cited by 2 publications

References 32 publications

MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge

MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge

Inference Latency Prediction at the Edge

Contact Info

Product

Resources

About