This work targets the automated minimum-energy optimization of Quantized Neural Networks (QNNs) -networks using low precision weights and activations. These networks are trained from scratch at an arbitrary fixed point precision. At iso-accuracy, QNNs using fewer bits require deeper and wider network architectures than networks using higher precision operators, while they require less complex arithmetic and less bits per weights. This fundamental trade-off is analyzed and quantified to find the minimum energy QNN for any benchmark and hence optimize energy-efficiency. To this end, the energy consumption of inference is modeled for a generic hardware platform. This allows drawing several conclusions across different benchmarks. First, energy consumption varies orders of magnitude at iso-accuracy depending on the number of bits used in the QNN. Second, in a typical system, BinaryNets or int4 implementations lead to the minimum energy solution, outperforming int8 networks up to 2 − 10× at iso-accuracy. All code used for QNN training is available from https://github.com/BertMoons/.
Convolutional Neural Networks (CNNs) now also start to reach impressive performance on non-classification image processing tasks, such as denoising, demosaicing, super resolution and super slow motion. Consequently, CNNs are increasingly deployed on very high resolution images. However, the resulting high resolution feature maps pose unseen requirements on the memory system of neural network processing systems, as on-chip memories are too small to store high resolution feature maps, while off-chip memories are very costly in terms of I/O bandwidth and power. This paper first shows that the classical layer-by-layer inference approaches are bounded in their external I/O bandwidth vs. on-chip memory trade-off space, making it infeasible to scale up to very high resolutions at a reasonable cost. Next, we demonstrate how an alternative depth-first network computation can reduce I/O bandwidth requirements up to >200× for a fixed on-chip memory size or, alternatively, reduce on-chip memory requirements up to >10000× for a fixed I/O bandwidth limitation. We further introduce an enhanced depth-first method, exploiting both line buffers and tiling, to further improve the external I/O bandwidth vs. on-chip memory capacity trade-off, and quantify its improvements beyond the current state-of-the-art.
Recently, there has been an increasing demand for advanced classification capabilities embedded on wearable battery constrained devices, such as smartphones or -watches. Achieving such functionality with a tight power and energy budget has proven a real challenge, specifically for large-scale Neural Network based applications. Previously, cascaded systems have been proposed to minimize energy consumption for such applications, either through using a single wake-up stage, or by using a linear-or tree based cascade of consecutive classifiers that allow early termination. In this work, we expand upon these concepts by generalizing cascades to hierarchical cascaded processing, where a hierarchy of increasingly complex classifiers, each designed and trained for a specific subtask is used. This hierarchical approach significantly outperforms the wake-up based approach by up to 2 orders of magnitude in energy consumption at iso-accuracy, specifically in systems with sparse input data such as speech recognition and visual object detection. This paper presents a general design framework for such systems and illustrates how to optimize them towards minimum energy consumption. The text further proposes a roofline model for cascaded systems, derives system level trade-offs and proves the approaches validity through a visual classification case-study.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.