Architectural Analysis of Deep Learning on Edge Accelerators

Kljucaric, Luke; Johnson, Alex; George, Alan

doi:10.1109/hpec43674.2020.9286209

Cited by 11 publications

(6 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The integrated circuits are application-specific (ASICs) that are used to increase specific machine learning tasks by putting processing elements-small digital signal processors (DSPs) with inbuilt memory on a framework and allowing them to communicate and transport data between them. The study in [91] analyzed low-power computer topology built into ML-specific hardware in the context of Chinese handwriting recognition. The work used NVIDIA Jetson AGX Xavier (AGX), Intel Neural Compute Stick 2 (NCS2), and Google Edge TPU architectures have been tested for performance.…”

Section: Tpu-based Edge Hardware Systems and Devicesmentioning

confidence: 99%

Artificial Intelligence (AI) and Machine Learning for Multimedia and Edge Information Processing

et al. 2022

View full text Add to dashboard Cite

The advancements and progress in artificial intelligence (AI) and machine learning, and the numerous availabilities of mobile devices and Internet technologies together with the growing focus on multimedia data sources and information processing have led to the emergence of new paradigms for multimedia and edge AI information processing, particularly for urban and smart city environments. Compared to cloud information processing approaches where the data are collected and sent to a centralized server for information processing, the edge information processing paradigm distributes the tasks to multiple devices which are close to the data source. Edge information processing techniques and approaches are well suited to match current technologies for Internet of Things (IoT) and autonomous systems, although there are many challenges which remain to be addressed. The motivation of this paper was to survey these new paradigms for multimedia and edge information processing from several technological perspectives including: (1) multimedia analytics on the edge empowered by AI; (2) multimedia streaming on the intelligent edge; (3) multimedia edge caching and AI; (4) multimedia services for edge AI; and (5) hardware and devices for multimedia on edge intelligence. The review covers a wide spectrum of enabling technologies for AI and machine learning for multimedia and edge information processing.

show abstract

Section: Tpu-based Edge Hardware Systems and Devicesmentioning

confidence: 99%

Artificial Intelligence (AI) and Machine Learning for Multimedia and Edge Information Processing

et al. 2022

View full text Add to dashboard Cite

show abstract

“…Bianco et al have benchmarked accuracy and inference time of a variety of DNNs on Nvidia Jetson TX1 [27], and found some of the DNNs bottlenecked by the amount of memory available on the device (such as ResNeXt). Kljucaric et al characterized the performance of AlexNet and GoogleNet on several devices: Nvidia jetson AGX Xavier, Intel Neural Compute Stick, and Google Edge TPU [28]. Their results showed best latency for AlexNet is achieved by AGX, while for GoogleNet, TPU is faster.…”

Section: G Architecture-algorithm Insightsmentioning

confidence: 99%

Benchmarking Test-Time Unsupervised Deep Neural Network Adaptation on Edge Devices

Bhardwaj¹,

Diffenderfer²,

Kailkhura³

et al. 2022

Preprint

View full text Add to dashboard Cite

The prediction accuracy of the deep neural networks (DNNs) after deployment at the edge can suffer with time due to shifts in the distribution of the new data. To improve robustness of DNNs, they must be able to update themselves to enhance their prediction accuracy. This adaptation at the resource-constrained edge is challenging as: (i) new labeled data may not be present; (ii) adaptation needs to be on device as connections to cloud may not be available; and (iii) the process must not only be fast but also memory-and energy-efficient. Recently, lightweight prediction-time unsupervised DNN adaptation techniques have been introduced that improve prediction accuracy of the models for noisy data by re-tuning the batch normalization (BN) parameters. This paper, for the first time, performs a comprehensive measurement study of such techniques to quantify their performance and energy on various edge devices as well as find bottlenecks and propose optimization opportunities. In particular, this study considers CIFAR-10-C image classification dataset with corruptions, three robust DNNs (ResNeXt, Wide-ResNet, ResNet-18), two BN adaptation algorithms (one that updates normalization statistics and the other that also optimizes transformation parameters), and three edge devices (FPGA, Raspberry-Pi, and Nvidia Xavier NX). We find that the approach that only updates the normalization parameters with Wide-ResNet, running on Xavier GPU, to be overall effective in terms of balancing multiple cost metrics. However, the adaptation overhead can still be significant (around 213 ms). The results strongly motivate the need for algorithmhardware co-design for efficient on-device DNN adaptation.

show abstract

“…As a key result, the paper reports a similar inference performance of Edge TPU compared to the i9-9900k CPU, but with, not surprisingly, a significantly lower power consumption. Kljucaric et al [8] compared the performance and efficiency of NVIDIA Xavier, Edge TPU, and NCS2 for optical character recognition using AlexNet and GoogleNet. The authors reported while NCS2 is more efficient for AlexNet, Edge TPU outperforms with GoogleNet.…”

Section: Related Workmentioning

confidence: 99%

Exploring Deep Neural Networks on Edge TPU

Hosseininoorbin¹,

Layeghy²,

Kusý³

et al. 2021

Preprint

View full text Add to dashboard Cite

This paper explores the performance of Google's Edge TPU on feed forward neural networks. We consider Edge TPU as a hardware platform and explore different architectures of deep neural network classifiers, which traditionally has been a challenge to run on resource constrained edge devices. Based on the use of a joint-time-frequency data representation, also known as spectrogram, we explore the trade-off between classification performance and the energy consumed for inference. The energy efficiency of Edge TPU is compared with that of widely-used embedded CPU ARM Cortex-A53. Our results quantify the impact of neural network architectural specifications on the Edge TPU's performance, guiding decisions on the TPU's optimal operating point, where it can provide high classification accuracy with minimal energy consumption. Also, our evaluations highlight the crossover in performance between the Edge TPU and Cortex-A53, depending on the neural network specifications. Based on our analysis, we provide a decision chart to guide decisions on platform selection based on the model parameters and context.

show abstract

Architectural Analysis of Deep Learning on Edge Accelerators

Cited by 11 publications

References 15 publications

Artificial Intelligence (AI) and Machine Learning for Multimedia and Edge Information Processing

Artificial Intelligence (AI) and Machine Learning for Multimedia and Edge Information Processing

Benchmarking Test-Time Unsupervised Deep Neural Network Adaptation on Edge Devices

Exploring Deep Neural Networks on Edge TPU

Contact Info

Product

Resources

About