Runtime configurable deep neural networks for energy-accuracy trade-off

Tann, Hokchhay; Hashemi, Soheil; Bahar, R. Iris; Reda, Sherief

doi:10.1145/2968456.2968458

Cited by 66 publications

(59 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…propose an incremental learning algorithm where the network in trained in incremental steps [11]. The idea is then to turn off large portions of the network in order to save energy if these portions are not needed to retain accuracy.…”

Section: Related Workmentioning

confidence: 99%

Hardware-Software Codesign of Accurate, Multiplier-free Deep Neural Networks

Tann

Hashemi

Bahar

et al. 2017

Proceedings of the 54th Annual Design Automation Conference 2017

Self Cite

View full text Add to dashboard Cite

While Deep Neural Networks (DNNs) push the state-ofthe-art in many machine learning applications, they often require millions of expensive floating-point operations for each input classification. This computation overhead limits the applicability of DNNs to low-power, embedded platforms and incurs high cost in data centers. This motivates recent interests in designing low-power, low-latency DNNs based on fixed-point, ternary, or even binary data precision. While recent works in this area offer promising results, they often lead to large accuracy drops when compared to the floating-point networks. We propose a novel approach to map floating-point based DNNs to 8-bit dynamic fixedpoint networks with integer power-of-two weights with no change in network architecture. Our dynamic fixed-point DNNs allow different radix points between layers. During inference, power-of-two weights allow multiplications to be replaced with arithmetic shifts, while the 8-bit fixed-point representation simplifies both the buffer and adder design. In addition, we propose a hardware accelerator design to achieve low-power, low-latency inference with insignificant degradation in accuracy. Using our custom accelerator design with the CIFAR-10 and ImageNet datasets, we show that our method achieves significant power and energy savings while increasing the classification accuracy.

show abstract

Section: Related Workmentioning

confidence: 99%

Hardware-Software Codesign of Accurate, Multiplier-free Deep Neural Networks

Tann

Hashemi

Bahar

et al. 2017

Proceedings of the 54th Annual Design Automation Conference 2017

Self Cite

View full text Add to dashboard Cite

show abstract

“…In particular, sequence-to-sequence networks, such as recurrent neural networks (RNNs) and the more recent transformers [2] are now considered state-of-the-art for applications involving data sequences (translation, summarization, question answering, etc.). The success of deep learning is mainly due to the increasing availability of large datasets and high performance hardware (mostly GPUs on cloud servers) to speed-up training [3][4][5].…”

Section: Introductionmentioning

confidence: 99%

“…While for most applications training is a one-time task, and can therefore be performed in the cloud, there is a growing demand for executing NN inference on embedded systems (so-called "edge" nodes), in order to enhance the features of many Internet of Things (IoT) applications [3]. In fact, edge inference could yield benefits in terms of data privacy, response latency and energy efficiency, as it would eliminate the need of transmitting high volumes of raw data to the cloud [4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19].…”

Section: Introductionmentioning

confidence: 99%

“…However, those initial efforts are static in nature, meaning that optimizations are performed at design time and are consequently independent from the specific input datum that the network is processing. More recently, some researchers have shown that dynamic (input-dependent) optimizations, applied at runtime, can lead to superior results [5,8,9,11,[23][24][25][26][27]. The rationale of dynamic solutions is that not all inputs are equally hard to process for a NN.…”

Section: Introductionmentioning

confidence: 99%

“…Different types of dynamic inference optimizations have been proposed in literature. Some researchers have developed "big/little" systems in which two NNs of different sizes and complexities are used depending on the input complexity [5,8,11,28]. Conditional [24] and hierarchical/staged [25][26][27] inference are other effective forms of dynamic optimization.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Sequence-To-Sequence Neural Networks Inference on Embedded Processors Using Dynamic Beam Search

2020

View full text Add to dashboard Cite

Sequence-to-sequence deep neural networks have become the state of the art for a variety of machine learning applications, ranging from neural machine translation (NMT) to speech recognition. Many mobile and Internet of Things (IoT) applications would benefit from the ability of performing sequence-to-sequence inference directly in embedded devices, thereby reducing the amount of raw data transmitted to the cloud, and obtaining benefits in terms of response latency, energy consumption and security. However, due to the high computational complexity of these models, specific optimization techniques are needed to achieve acceptable performance and energy consumption on single-core embedded processors. In this paper, we present a new optimization technique called dynamic beam search, in which the inference complexity is tuned to the difficulty of the processed input sequence at runtime. Results based on measurements on a real embedded device, and on three state-of-the-art deep learning models, show that our method is able to reduce the inference time and energy by up to 25% without loss of accuracy.

show abstract