A Kernel Decomposition Architecture for Binary-weight Convolutional Neural Networks

Kim, Hyeonuk; Sim, Jaehyeong; Choi, Yoo-Joo; Kim, Lee-Sup

doi:10.1145/3061639.3062189

Cited by 31 publications

(11 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…FINN [53] uses FPGAs for accelerating Binary DNNs, while YodaNN [54] and BRein [55] propose an ASIC accelerator for binary DNNs. Kim, et al [56] decompose the convolution weights for binary CNNs to improve performance and energy efficiency. The above works focus solely on binary DNNs to achieve high performance at the cost of classification accuracy.…”

Section: Related Workmentioning

confidence: 99%

Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network

Sharma

Park

Suda

et al. 2018

2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA)

422

250

View full text Add to dashboard Cite

Hardware acceleration of Deep Neural Networks (DNNs) aims to tame their enormous compute intensity. Fully realizing the potential of acceleration in this domain requires understanding and leveraging algorithmic properties of DNNs. This paper builds upon the algorithmic insight that bitwidth of operations in DNNs can be reduced without compromising their classification accuracy. However, to prevent loss of accuracy, the bitwidth varies significantly across DNNs and it may even be adjusted for each layer individually. Thus, a fixed-bitwidth accelerator would either offer limited benefits to accommodate the worst-case bitwidth requirements, or inevitably lead to a degradation in final accuracy. To alleviate these deficiencies, this work introduces dynamic bit-level fusion/decomposition as a new dimension in the design of DNN accelerators. We explore this dimension by designing Bit Fusion, a bit-flexible accelerator, that constitutes an array of bit-level processing elements that dynamically fuse to match the bitwidth of individual DNN layers. This flexibility in the architecture enables minimizing the computation and the communication at the finest granularity possible with no loss in accuracy. We evaluate the benefits of Bit Fusion using eight real-world feed-forward and recurrent DNNs. The proposed microarchitecture is implemented in Verilog and synthesized in 45 nm technology. Using the synthesis results and cycle accurate simulation, we compare the benefits of Bit Fusion to two state-of-the-art DNN accelerators, Eyeriss [1] and Stripes [2]. In the same area, frequency, and process technology, Bit Fusion offers 3.9× speedup and 5.1× energy savings over Eyeriss. Compared to Stripes, Bit Fusion provides 2.6× speedup and 3.9× energy reduction at 45 nm node when Bit Fusion area and frequency are set to those of Stripes. Scaling to GPU technology node of 16 nm, Bit Fusion almost matches the performance of a 250-Watt Titan Xp, which uses 8-bit vector instructions, while Bit Fusion merely consumes 895 milliwatts of power.

show abstract

Section: Related Workmentioning

confidence: 99%

Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network

Sharma

Park

Suda

et al. 2018

2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA)

422

250

View full text Add to dashboard Cite

show abstract

“…[17] investigated the opportunity to use deep learning for identifying nonintuitive features from cross-sensor correlations by means of an RBM. In [1] a kernel decomposing scheme in binary-weight networks is proposed that skips redundant computations and achieves 22% energy reduction on image classification. BiNMAC is proposed in [18] which is a programmable manycore accelerator for BNNs designed for physiological and Image processing case studies.…”

Section: Related Workmentioning

confidence: 99%

“…Such devices usually process real-time data, read from multimodal sensors continuously, and suffer from resource-bound and limited battery budget due to their small size, online monitoring, and portability. Therefore, minimizing the power dissipation of these devices while meeting real-time requirements is a subject of interest [1][2][3][4][5].…”

Section: Introductionmentioning

confidence: 99%

Minimizing Classification Energy of Binarized Neural Network Inference for Wearable Devices

Hosseini

Paneliya

Kallakuri

et al. 2019

20th International Symposium on Quality Electronic Design (ISQED)

View full text Add to dashboard Cite

In this paper, we propose a low-power hardware for efficient deployment of binarized neural networks (BNNs) that have been trained for physiological datasets. BNNs constrain weights and feature-map to 1 bit, can pack in as many 1-bit weights as the width of a memory entry provides, and can execute multiple multiply-accumulate (MAC) operations with one fused bit-wise xnor and population-count instruction over aligned packed entries. Our proposed hardware is scalable with the number of processing engines (PEs) and the memory width, both of which adjustable for the most energy efficient configuration given an application. We implement two real case studies including Physical Activity Monitoring and Stress Detection on our platform, and for each case study on the target platform, we seek the optimal PE and memory configurations. Our implementation results indicate that a configuration with a good choice of memory width and number of PEs can be optimized up to 4× and 2.5× in energy consumption respectively on Artix-7 FPGA and on 65nm CMOS ASIC implementation. We also show that, generally, wider memories make more efficient BNN processing hardware. To further reduce the energy, we introduce Pool-Skipping technique that can skip at least 25% of the operations that are accompanied by a Max-Pool layer in BNNs, leading to a total of 22% operation reduction in the Stress Detection case study. Compared to the related works using the same case studies on the same target platform and with the same classification accuracy, our hardware is respectively 4.5× and 250× more energy efficient for the Stress Detection on FPGA and Physical Activity Monitoring on ASIC, respectively.

show abstract

“…As a distinctive feature, the binary quantization is not only applied during the forward pass, but also during the backward pass of the gradient descent algorithm, and acts as a sort of regularizer [16]. Hardware accelerators for highly-quantized NNs have been presented on FPGA [23], ASIC [3,10] and neuromorphic brain-inspired chips such as Truenorth [6], trading the exibility of general-purpose processors with highest performance and energy e ciency of specialized hardware. To lower the computational complexity of BNNs, a hardware-oriented kernel decomposition strategy is presented in [10], using clockgating to reduce the energy cost of redundant convolutions.…”

Section: Related Workmentioning

confidence: 99%

“…Hardware accelerators for highly-quantized NNs have been presented on FPGA [23], ASIC [3,10] and neuromorphic brain-inspired chips such as Truenorth [6], trading the exibility of general-purpose processors with highest performance and energy e ciency of specialized hardware. To lower the computational complexity of BNNs, a hardware-oriented kernel decomposition strategy is presented in [10], using clockgating to reduce the energy cost of redundant convolutions. is is clearly e ective but less prone to be implemented in so ware because it is weight-dependent and does not bene t from data spatial contiguity, which is exploited in this work to reduce the computation latency.…”

Section: Related Workmentioning

confidence: 99%

Always-ON visual node with a hardware-software event-based binarized neural network inference engine

Rusci

Rossi

Flamand³

et al. 2018

Proceedings of the 15th ACM International Conference on Computing Frontiers

View full text Add to dashboard Cite

is work introduces an ultra-low-power visual sensor node coupling event-based binary acquisition with Binarized Neural Networks (BNNs) to deal with the stringent power requirements of always-on vision systems for IoT applications. By exploiting insensor mixed-signal processing, an ultra-low-power imager generates a sparse visual signal of binary spatial-gradient features. e sensor output, packed as a stream of events corresponding to the asserted gradient binary values, is transferred to a 4-core processor when the amount of data detected a er frame di erence surpasses a given threshold. en, a BNN trained with binary gradients as input runs on the parallel processor if a meaningful activity is detected in a pre-processing stage. During the BNN computation, the proposed Event-based Binarized Neural Network model achieves a system energy saving of 17.8% with respect to a baseline system including a low-power RGB imager and a Binarized Neural Network, while paying a classi cation performance drop of only 3% for a real-life 3-classes classi cation scenario. e energy reduction increases up to 8x when considering a long-term always-on monitoring scenario, thanks to the event-driven behavior of the processing sub-system.

show abstract

A Kernel Decomposition Architecture for Binary-weight Convolutional Neural Networks

Cited by 31 publications

References 7 publications

Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network

Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network

Minimizing Classification Energy of Binarized Neural Network Inference for Wearable Devices

Always-ON visual node with a hardware-software event-based binarized neural network inference engine

Contact Info

Product

Resources

About