Energy-efficient deep learning inference on edge devices

Daghero, Francesco; Pagliari, Daniele Jahier; Poncino, Massimo

doi:10.1016/bs.adcom.2020.07.002

Cited by 26 publications

(24 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Machine Learning (ML) plays an increasingly important role in many Internet of Things (IoT) applications, ranging from computer vision to time-series processing [7,9,28,31]. Edge computing, as a paradigm to host data-analytics as close as possible to end devices, may offer several advantages compared to the standard cloud-centric approach.…”

Section: Introductionmentioning

confidence: 99%

Ultra-compact binary neural networks for human activity recognition on RISC-V processors

Daghero

Xie

Pagliari

et al. 2021

Proceedings of the 18th ACM International Conference on Computing Frontiers

Self Cite

View full text Add to dashboard Cite

Human Activity Recognition (HAR) is a relevant inference task in many mobile applications. State-of-the-art HAR at the edge is typically achieved with lightweight machine learning models such as decision trees and Random Forests (RFs), whereas deep learning is less common due to its high computational complexity. In this work, we propose a novel implementation of HAR based on deep neural networks, and precisely on Binary Neural Networks (BNNs), targeting low-power general purpose processors with a RISC-V instruction set. BNNs yield very small memory footprints and low inference complexity, thanks to the replacement of arithmetic operations with bit-wise ones. However, existing BNN implementations on general purpose processors impose constraints tailored to complex computer vision tasks, which result in over-parametrized models for simpler problems like HAR. Therefore, we also introduce a new BNN inference library, which targets ultra-compact models explicitly. With experiments on a single-core RISC-V processor, we show that BNNs trained on two HAR datasets obtain higher classification accuracy compared to a state-of-the-art baseline based on RFs. Furthermore, our BNN reaches the same accuracy of a RF with either less memory (up to 91%) or more energy-efficiency (up to 70%), depending on the complexity of the features extracted by the RF.

show abstract

Section: Introductionmentioning

confidence: 99%

Ultra-compact binary neural networks for human activity recognition on RISC-V processors

Daghero

Xie

Pagliari

et al. 2021

Proceedings of the 18th ACM International Conference on Computing Frontiers

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, the high specialization of the neural accelerators makes them potentially much more energyefficient, a critical parameter in embedded systems. These processors can be found under different names such as Tensor Processing Unit (TPU) [33], Neural Processing Unit (NPU) [34], or Vision Processing Unit (VPU) [35].…”

Section: E Deep Learning Neural Accelerators For the Edgementioning

confidence: 99%

Characterizing Deep Neural Networks on Edge Computing Systems for Object Classification in 3D Point Clouds

Wisultschew¹,

Pérez²,

Otero³

et al. 2022

IEEE Sensors J.

View full text Add to dashboard Cite

The current trend of shifting computing from the cloud to the edge of the Internet of Things is influencing deep learning applications. Moving intelligence closer to the point of need entails advantages in terms of performance, power consumption, security, and privacy. The problem arises with data sources that generate a massive amount of information, making data processing challenging for edge devices. This is the case of point clouds generated by LIDAR sensors. Implementations at the edge become even more challenging when heavy processing algorithms such as deep neural networks are selected. However, deep neural networks are the stateof-the-art solution to carry out object classification tasks as they provide the best results in terms of accuracy when working with high data volumes. This work demonstrates that the processing of point cloud-based sensors using deep neural networks at the edge is becoming feasible with the emergence of new devices with high computing capacity combined with reduced power consumption. In this regard, a characterization of first-in-class deep learning classification algorithms working with point cloud data as inputs and running over different state-of-the-art edge processing architectures is provided. A broad range of devices, including CPUs, GPU-based, SoC FPGA-based, and deep learning neural accelerators, have been evaluated in terms of inference time, classification accuracy, and power consumption. As a result, it demonstrates that neural accelerators with integrated host CPUs represent the best trade-off between power consumption and performance, making them a perfect solution for IoT applications at the edge level.

show abstract

“…In practice, this result is obtained masking different slices of each layer's weights with binary parameters, so that the slices multiplied with a 0 are effectively eliminated from the layer. The continuous relaxation of the binary mask is then optimized, similarly to the architectural weights in a super-net DNAS, with the objective of reducing the network complexity, by eliminating unimportant parts of each layer (in that, this approach is similar to a structured pruning [5]). The usage of masks introduces a minimum overhead with respect to a normal training of the seed [20], reducing the search time and memory requirements significantly compared to super-net approaches, and representing a further step towards lightweight NAS.…”

Section: Background and Related Workmentioning

confidence: 99%

“…Deep Learning (DL) is at the core of many modern computing applications, such as computer vision [1], sound classification [2], bio-signal analysis [3], predictive maintenance [4], etc. While DL models have been traditionally deployed on powerful cloud-based servers, evidence exists about the potential advantages of an implementation at-theedge [5]. Edge computing could improve privacy and reduce the energy consumption at the distributed system level, by replacing the energy hungry wireless transmission of raw data with more efficient local computations and transmission of aggregated outputs [6].…”

Section: Introductionmentioning

confidence: 99%

Multi-Complexity-Loss DNAS for Energy-Efficient and Memory-Constrained Deep Neural Networks

Risso,

Burrello,

Benini

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Neural Architecture Search (NAS) is increasingly popular to automatically explore the accuracy versus computational complexity trade-off of Deep Learning (DL) architectures. When targeting tiny edge devices, the main challenge for DL deployment is matching the tight memory constraints, hence most NAS algorithms consider model size as the complexity metric. Other methods reduce the energy or latency of DL models by trading off accuracy and number of inference operations. Energy and memory are rarely considered simultaneously, in particular by low-search-cost Differentiable NAS (DNAS) solutions.We overcome this limitation proposing the first DNAS that directly addresses the most realistic scenario from a designer's perspective: the co-optimization of accuracy and energy (or latency) under a memory constraint, determined by the target HW. We do so by combining two complexity-dependent loss functions during training, with independent strength. Testing on three edgerelevant tasks from the MLPerf Tiny benchmark suite, we obtain rich Pareto sets of architectures in the energy vs. accuracy space, with memory footprints constraints spanning from 75% to 6.25% of the baseline networks. When deployed on a commercial edge device, the STM NUCLEO-H743ZI2, our networks span a range of 2.18x in energy consumption and 4.04% in accuracy for the same memory constraint, and reduce energy by up to 2.2× with negligible accuracy drop with respect to the baseline.

show abstract

Energy-efficient deep learning inference on edge devices

Cited by 26 publications

References 31 publications

Ultra-compact binary neural networks for human activity recognition on RISC-V processors

Ultra-compact binary neural networks for human activity recognition on RISC-V processors

Characterizing Deep Neural Networks on Edge Computing Systems for Object Classification in 3D Point Clouds

Multi-Complexity-Loss DNAS for Energy-Efficient and Memory-Constrained Deep Neural Networks

Contact Info

Product

Resources

About