Dual Dynamic Inference: Enabling More Efficient, Adaptive, and Controllable Deep Inference

Wang, Yue; Shen, Jianghao; Hu, Ting-Kuei; Xu, Pengfei; Nguyen, Tan; Baraniuk, Richard G.; Wang, Zhangyang; Lin, Yingyan

doi:10.1109/jstsp.2020.2979669

Cited by 65 publications

(37 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One issue that makes creating compact NNs for efficient inference challenging is the dependency on the characteristics of the target hardware. A framework for dynamic inference on resource constrained hardware, including input-and resource dependent dynamic inference mechanisms, allowing to meet specific resource constraints, has been proposed recently [25]. First steps are being made towards network compression methods that output representations prepared for later specialization to the target platform [26].…”

Section: Related Workmentioning

confidence: 99%

Overview of the Neural Network Compression and Representation (NNR) Standard

Kirchhoffer

Haase

Samek

et al. 2022

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 99%

Overview of the Neural Network Compression and Representation (NNR) Standard

Kirchhoffer

Haase

Samek

et al. 2022

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

“…As a second case, when the application is designed using data-driven adaptive methodologies, such as [12,14,25,31,34], the CNN execution is sensitive to the input data complexity. To process "easy" images, they may use a lower resolution or fewer layers, whereas processing "hard" images requires more computation.…”

Section: Motivational Examplementioning

confidence: 99%

“…This may lead to increased UAV power consumption over the flight duration and, eventually, to the violation of the application power constraint, causing an emergency landing as illustrated in Figure 2(h). Thus, the methodologies in [12,14,25,31,34] are not suitable for CNN-based applications executed at the edge in changing environment, because these can neither properly adapt the application to the environment variations, nor guarantee that the application constantly meets platform-aware constraints.…”

Section: Motivational Examplementioning

confidence: 99%

Scenario Based Run-Time Switching for Adaptive CNN-Based Applications at the Edge

Minakova

Sapra

Stefanov

et al. 2022

ACM Trans. Embed. Comput. Syst.

View full text Add to dashboard Cite

Convolutional Neural Networks (CNNs) are biologically inspired computational models that are at the heart of many modern computer vision and natural language processing applications. Some of the CNN-based applications are executed on mobile and embedded devices. Execution of CNNs on such devices places numerous demands on the CNNs, such as high accuracy, high throughput, low memory cost, and low energy consumption. These requirements are very difficult to satisfy at the same time, so CNN execution at the edge typically involves trade-offs (e.g., high CNN throughput is achieved at the cost of decreased CNN accuracy). In existing methodologies, such trade-offs are either chosen once and remain unchanged during a CNN-based application execution, or are adapted to the properties of the CNN input data. However, the application needs can also be significantly affected by the changes in the application environment, such as a change of the battery level in the edge device. Thus, CNN-based applications need a mechanism that allows to dynamically adapt their characteristics to the changes in the application environment at run-time. Therefore, in this article, we propose a scenario-based run-time switching (SBRS) methodology, that implements such a mechanism.

show abstract

“…FlexDNN [12] Vision/Classification Footprint overhead-aware design of EE-networks. DDI [76] Vision/Classification Combines layer/channel skipping with early exiting. MESS [40] Vision/Segmentation Image-level EE based on difficulty for semantic segmentation.…”

Section: Early-exiting Network-agnostic Techniquesmentioning

confidence: 99%

Adaptive Inference through Early-Exit Networks

Laskaridis

Kouris

Lane

2021

Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning

View full text Add to dashboard Cite

DNNs are becoming less and less over-parametrised due to recent advances in efficient model design, through careful hand-crafted or NAS-based methods. Relying on the fact that not all inputs require the same amount of computation to yield a confident prediction, adaptive inference is gaining attention as a prominent approach for pushing the limits of efficient deployment. Particularly, early-exit networks comprise an emerging direction for tailoring the computation depth of each input sample at runtime, offering complementary performance gains to other efficiency optimisations. In this paper, we decompose the design methodology of early-exit networks to its key components and survey the recent advances in each one of them. We also position early-exiting against other efficient inference solutions and provide our insights on the current challenges and most promising future directions for research in the field. CCS CONCEPTS• Computing methodologies → Neural networks; • Humancentered computing → Ubiquitous and mobile computing systems and tools.

show abstract

Dual Dynamic Inference: Enabling More Efficient, Adaptive, and Controllable Deep Inference

Cited by 65 publications

References 35 publications

Overview of the Neural Network Compression and Representation (NNR) Standard

Overview of the Neural Network Compression and Representation (NNR) Standard

Scenario Based Run-Time Switching for Adaptive CNN-Based Applications at the Edge

Adaptive Inference through Early-Exit Networks

Contact Info

Product

Resources

About