EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators

Cavigelli, Lukas; Rutishauser, Georg; Benini, Luca

doi:10.1109/jetcas.2019.2950093

Cited by 31 publications

(21 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Activations extracted by ReLU provide a known sparsity [10], thus several sparsity-based coding formats are designed to compress feature maps: Coordinate (COO) [25], Bitmap [26], Run-Length Coding [27] etc.. Several methods based on these coding formats describe their hardware architecture to reduce computational cost: Cnvlutin [11], SCNN [12], Eyeriss [13], EIE [14], etc.. Cambricon-SE [28] tries to use Huffman and LZW to improve the compression ratio of feature maps.…”

Section: A Coding Format and Compression Encodersmentioning

confidence: 99%

“…As a result, the large data of feature maps on CNN have to be repeatedly transferred between on-chip and off-chip memory, which greatly increases the power consumption and the data transfer bandwidth. Therefore, it is a topic worth exploring to reduce this power consumption and latency by compressing feature maps to improve the performance of CNN models on specialized hardware [10].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Transform-Based Feature Map Compression for CNN Inference

Shi

Wang

Chen

et al. 2021

2021 IEEE International Symposium on Circuits and Systems (ISCAS)

View full text Add to dashboard Cite

To achieve higher accuracy in machine learning tasks, very deep convolutional neural networks (CNNs) are designed recently. However, the large memory access of deep CNNs will lead to high power consumption. A variety of hardware-friendly compression methods have been proposed to reduce the data transfer bandwidth by exploiting the sparsity of feature maps. Most of them focus on designing a specialized encoding format to increase the compression ratio. Differently, we observe and exploit the sparsity distinction between activations in earlier and later layers to improve the compression ratio. We propose a novel hardware-friendly transform-based method named 1D-Discrete Cosine Transform on Channel dimension with Masks (DCT-CM), which intelligently combines DCT, masks, and a coding format to compress activations. The proposed algorithm achieves an average compression ratio of 2.9× (53% higher than the stateof-the-art transform-based feature map compression works) during inference on ResNet-50 with an 8-bit quantization scheme.

show abstract

Section: A Coding Format and Compression Encodersmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Transform-Based Feature Map Compression for CNN Inference

Shi

Wang

Chen

et al. 2021

2021 IEEE International Symposium on Circuits and Systems (ISCAS)

View full text Add to dashboard Cite

show abstract

“…The main drawback of compression representation learning based approaches is that they alter the DNN model and then require a retraining phase. EBPC [6] is a hardware-friendly and lossless compression scheme for the feature maps present within CNNs. However it is limited on the compression of the feature maps although the model parameters/weights are responsible for a major fraction of the overall memory/communication traffic (see Fig.…”

Section: Related Workmentioning

confidence: 99%

“…Current DNN models rely on millions or even billions of parameters, thus exacerbating the role played by the communication and memory sub-systems for moving such high data volume from the main memory into the accelerator and then to its many PEs. Thus, the performance and energy figures of a DNN accelerator are severely affected by the communication and memory sub-system [6], [7]. Fig.…”

Section: Introductionmentioning

confidence: 99%

DNNZip: Selective Layers Compression Technique in Deep Neural Network Accelerators

Lahdhiri

Palesi

Monteleone

et al. 2020

2020 23rd Euromicro Conference on Digital System Design (DSD)

View full text Add to dashboard Cite

In Deep Neural Network (DNN) accelerators, the onchip traffic and memory traffic accounts for a relevant fraction of the inference latency and energy consumption. A major component of such traffic is due to the moving of the DNN model parameters from the main memory to the memory interface and from the latter to the processing elements (PEs) of the accelerator. In this paper, we present DNNZip, a technique aimed at compressing the model parameters of a DNN, thus resulting in significant energy and performance improvement. DNNZip implements a lossy compression whose compression ratio is tuned based on the maximum tolerated error on the model parameters provided by the user. DNNZip is assessed on several convolutional NNs and the trade-off inference energy saving vs. inference latency reduction vs. network accuracy degradation is discussed. We found that up to 64% energy saving, and up to 67% latency reduction can be obtained with a limited impact on the accuracy of the network.

show abstract

“…However, transmitting the data comes at a high energy cost, introduces privacy concerns, requires expensive infrastructure, and has high latency. Alternatively, the challenge of analyzing the data near the sensor can be tackled by a combination of algorithmic optimizations to allow working with reduced arithmetic precision, and hardware acceleration [3], [4] with various techniques to maximize energy efficiency, such as minimizing off-accelerator data transfers [5]- [7].…”

Section: Introductionmentioning

confidence: 99%

ChewBaccaNN: A Flexible 223 TOPS/W BNN Accelerator

Andri

Karunaratne

Cavigelli

et al. 2021

2021 IEEE International Symposium on Circuits and Systems (ISCAS)

Self Cite

View full text Add to dashboard Cite

Binary Neural Networks enable smart IoT devices, as they significantly reduce the required memory footprint and computational complexity while retaining a high network performance and flexibility. This paper presents ChewBaccaNN, a 0.7 mm 2 sized binary convolutional neural network (CNN) accelerator designed in GlobalFoundries 22 nm technology. By exploiting efficient data re-use, data buffering, latch-based memories, and voltage scaling, a throughput of 241 GOPS is achieved while consuming just 1.1 mW at 0.4V/154MHz during inference of binary CNNs with up to 7×7 kernels, leading to a peak core energy efficiency of 223 TOPS/W. ChewBaccaNN's flexibility allows to run a much wider range of binary CNNs than other accelerators, drastically improving the accuracy-energy tradeoff beyond what can be captured by the TOPS/W metric. In fact, it can perform CIFAR-10 inference at 86.8% accuracy with merely 1.3 µJ, thus exceeding the accuracy while at the same time lowering the energy cost by 2.8× compared to even the most efficient and much larger analog processing-in-memory devices, while keeping the flexibility of running larger CNNs for higher accuracy when needed. It also runs a binary ResNet-18 trained on the 1000-class ILSVRC dataset and improves the energy efficiency by 4.4× over accelerators of similar flexibility. Furthermore, it can perform inference on a binarized ResNet-18 trained with 8bases Group-Net to achieve a 67.5% Top-1 accuracy with only 3.0 mJ/frame-at an accuracy drop of merely 1.8% from the fullprecision ResNet-18.

show abstract

EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators

Cited by 31 publications

References 41 publications

Transform-Based Feature Map Compression for CNN Inference

Transform-Based Feature Map Compression for CNN Inference

DNNZip: Selective Layers Compression Technique in Deep Neural Network Accelerators

ChewBaccaNN: A Flexible 223 TOPS/W BNN Accelerator

Contact Info

Product

Resources

About