Trading-Off Accuracy and Energy of Deep Inference on Embedded Systems: A Co-Design Approach

Jayakodi, Nitthilan Kannappan; Chatterjee, Anirban; Choi, Wonje; Doppa, Janardhan Rao; Pande, Partha Pratim

doi:10.1109/tcad.2018.2857338

Cited by 27 publications

(10 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The core idea is to adapt models during inference based on the complexity of the current task. So far, the model adaption happens at least along four common dimensions: number of layers or cascaded models [66][67][68], number of channels [69][70][71][72], input image resolution [73], and computation precision [74,75].…”

Section: Adaptive Inferencementioning

confidence: 99%

“…Along the layer dimension, Panda et al [66] propose a conditional deep learning (CDL) network, which can identify the variability in the difficulty of input instances and conditionally activate the deeper layers of the network. Extended from CDL [66], Jayakodi et al [67] propose onthe-fly classifier selection: simple classifiers for easy inputs and complex classifiers for hard inputs. Stamoulis et al [68] propose a systematic approach for hyper-parameter optimization of adaptive CNNs, using Bayesian optimization to determine the number of channels, kernel sizes, and the number of units in the fully connected layers.…”

Section: Adaptive Inferencementioning

confidence: 99%

See 1 more Smart Citation

Enabling Design Methodologies and Future Trends for Edge AI: Specialization and Codesign

et al. 2021

View full text Add to dashboard Cite

Artificial intelligence (AI) technologies have dramatically advanced in recent years, resulting in revolutionary changes in people's lives. Empowered by edge computing, AI workloads are migrating from centralized cloud architectures to distributed edge systems, introducing a new paradigm called edge AI. While edge AI has the promise of bringing significant increases in autonomy and intelligence into everyday lives through common edge devices, it also raises new challenges, especially for the development of its algorithms and the deployment of its services, which call for novel design methodologies catered to these unique challenges. In this paper, we provide a comprehensive survey of the latest enabling design methodologies that span the entire edge AI development stack. We suggest that the key methodologies for effective edge AI development are single-layer specialization and cross-layer co-design. We discuss representative methodologies in each category in detail, including on-device training methods, specialized software design, dedicated hardware design, benchmarking and design automation, software/hardware co-design, software/compiler co-design, and compiler/hardware co-design. Moreover, we attempt to reveal hidden cross-layer design opportunities that can further boost the solution quality of future edge AI and provide insights into future directions and emerging areas that require increased research focus.

show abstract

Section: Adaptive Inferencementioning

confidence: 99%

Section: Adaptive Inferencementioning

confidence: 99%

Enabling Design Methodologies and Future Trends for Edge AI: Specialization and Codesign

et al. 2021

View full text Add to dashboard Cite

show abstract

“…(9) Therefore, in Algorithm 1, we select the next ReRAM design and the fidelity of ReSNA pair that maximizes the information gain per unit cost about the optimal Pareto front based on Equation (9).…”

Section: Selecting Reram Design To Evaluate Via Information Gainmentioning

confidence: 99%

“…However, a key challenge in executing DNN inferencing [9]- [11] on ReRAM-based architecture arises due to nonidealities of ReRAM devices, which can degrade the accuracy of inferencing. Since DNN inferencing involves a sequence of forward computations over DNN layers, errors due to device nonidealities can propagate and accumulate, resulting in incorrect predictions.…”

Section: Introductionmentioning

confidence: 99%

Multi-Objective Optimization of ReRAM Crossbars for Robust DNN Inferencing under Stochastic Noise

Yang¹,

Belakaria²,

Joardar³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Resistive random-access memory (ReRAM) is a promising technology for designing hardware accelerators for deep neural network (DNN) inferencing. However, stochastic noise in ReRAM crossbars can degrade the DNN inferencing accuracy. We propose the design and optimization of a high-performance, area-and energy-efficient ReRAMbased hardware accelerator to achieve robust DNN inferencing in the presence of stochastic noise. We make two key technical contributions. First, we propose a stochastic-noise-aware training method, referred to as ReSNA, to improve the accuracy of DNN inferencing on ReRAM crossbars with stochastic noise. Second, we propose an informationtheoretic algorithm, referred to as CF-MESMO, to identify the Pareto set of solutions to trade-off multiple objectives, including inferencing accuracy, area overhead, execution time, and energy consumption. The main challenge in this context is that executing the ReSNA method to evaluate each candidate ReRAM design is prohibitive. To address this challenge, we utilize the continuous-fidelity evaluation of ReRAM designs associated with prohibitive high computation cost by varying the number of training epochs to trade-off accuracy and cost. CF-MESMO iteratively selects the candidate ReRAM design and fidelity pair that maximizes the information gained per unit computation cost about the optimal Pareto front. Our experiments on benchmark DNNs show that the proposed algorithms efficiently uncover high-quality Pareto fronts. On average, ReSNA achieves 2.57% inferencing accuracy improvement for ResNet20 on the CIFAR-10 dataset with respect to the baseline configuration. Moreover, CF-MESMO algorithm achieves 90.91% reduction in computation cost compared to the popular multi-objective optimization algorithm NSGA-II to reach the best solution from NSGA-II.

show abstract

“…Besides, since computation directly translates into energy consumption and IoT devices are usually battery-constrained [9], the high computation demand of training will quickly drain the battery. While existing works [10]- [12] effectively reduce the computation cost of inference by assigning input instances to different classifiers according to the difficulty, the computation cost of training is not reduced.…”

Section: Introductionmentioning

confidence: 99%

Enabling On-Device CNN Training by Self-Supervised Instance Filtering and Error Map Pruning

Wang

Shi

et al. 2020

Preprint

View full text Add to dashboard Cite

This work aims to enable on-device training of convolutional neural networks (CNNs) by reducing the computation cost at training time. CNN models are usually trained on high-performance computers and only the trained models are deployed to edge devices. But the statically trained model cannot adapt dynamically in a real environment and may result in low accuracy for new inputs. On-device training by learning from the real-world data after deployment can greatly improve accuracy. However, the high computation cost makes training prohibitive for resource-constrained devices. To tackle this problem, we explore the computational redundancies in training and reduce the computation cost by two complementary approaches: selfsupervised early instance filtering on data level and error map pruning on the algorithm level. The early instance filter selects important instances from the input stream to train the network and drops trivial ones. The error map pruning further prunes out insignificant computations when training with the selected instances. Extensive experiments show that the computation cost is substantially reduced without any or with marginal accuracy loss. For example, when training ResNet-110 on CIFAR-10, we achieve 68% computation saving while preserving full accuracy and 75% computation saving with a marginal accuracy loss of 1.3%. Aggressive computation saving of 96% is achieved with less than 0.1% accuracy loss when quantization is integrated into the proposed approaches. Besides, when training LeNet on MNIST, we save 79% computation while boosting accuracy by 0.2%.

show abstract

Trading-Off Accuracy and Energy of Deep Inference on Embedded Systems: A Co-Design Approach

Cited by 27 publications

References 18 publications

Enabling Design Methodologies and Future Trends for Edge AI: Specialization and Codesign

Enabling Design Methodologies and Future Trends for Edge AI: Specialization and Codesign

Multi-Objective Optimization of ReRAM Crossbars for Robust DNN Inferencing under Stochastic Noise

Enabling On-Device CNN Training by Self-Supervised Instance Filtering and Error Map Pruning

Contact Info

Product

Resources

About