Reducing Dynamic Power in Streaming CNN Hardware Accelerators by Exploiting Computational Redundancies

Piyasena, Duvindu; Wickramasinghe, Rukshan; Paul, Debdeep; Lam, Siew-Kei; Wu, Meiqing

doi:10.1109/fpl.2019.00063

Cited by 10 publications

(7 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The efficiency of the Skipping approximation relies on how often a computation can be skipped, the complexity of the conditional prediction, as well as the complexity of the skipped operation. Piyasena et al [94] leverages the widely used ReLu activation function to eliminate redundant computations. [94] estimates the sign of the convolution output using a low-cost prediction scheme.…”

Section: Skippingmentioning

confidence: 99%

“…Piyasena et al [94] leverages the widely used ReLu activation function to eliminate redundant computations. [94] estimates the sign of the convolution output using a low-cost prediction scheme. In this scheme, a power-of-two weight quantization is applied so that multiplications can be replaced with simple logic shifters.…”

Section: Skippingmentioning

confidence: 99%

“…With respect to the employed precision, the Skipping approximation techniques mainly use high precision (i.e., 16-bit and 32-bit precision). However, [61] and [94] use lower precision, i.e., 12-bit and 8-bit respectively. Again, when considering simpler evaluation cases (e.g., LeNet, MNIST, and/or 32-bit precision) Skipping approximation delivers minimal accuracy loss and very high energy gains.…”

Section: Performance Analysismentioning

confidence: 99%

See 2 more Smart Citations

Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey

et al. 2022

View full text Add to dashboard Cite

Deep Neural Networks (DNNs) are very popular because of their high performance in various cognitive tasks in Machine Learning (ML). Recent advancements in DNNs have brought beyond human accuracy in many tasks, but at the cost of high computational complexity. To enable efficient execution of DNN inference, more and more research works, therefore, exploit the inherent error resilience of DNNs and employ Approximate Computing (AC) principles to address the elevated energy demands of DNN accelerators. This article provides a comprehensive survey and analysis of hardware approximation techniques for DNN accelerators. First, we analyze the state of the art and by identifying approximation families, we cluster the respective works with respect to the approximation type. Next, we analyze the complexity of the performed evaluations (with respect to the dataset and DNN size) to assess the efficiency, the potential, and limitations of approximate DNN accelerators. Moreover, a broad discussion is provided, regarding error metrics that are more suitable for designing approximate units for DNN accelerators as well as accuracy recovery approaches that are tailored to DNN inference. Finally, we present how Approximate Computing for DNN accelerators can go beyond energy efficiency and address reliability and security issues, as well.

show abstract

Section: Skippingmentioning

confidence: 99%

Section: Skippingmentioning

confidence: 99%

Section: Performance Analysismentioning

confidence: 99%

See 1 more Smart Citation

Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey

et al. 2022

View full text Add to dashboard Cite

show abstract

“…There are a number of hardware architectures found in the literature that aim to provide acceleration for CNN applications while reducing computational redundancies [ 14 , 15 , 16 , 17 ]. And, there are some approaches that attempt to exploit the high bandwidth available near the sensor interface by bringing the computation closer to the image sensor [ 7 ].…”

Section: Related Workmentioning

confidence: 99%

Towards an Efficient CNN Inference Architecture Enabling In-Sensor Processing

Pantho

Bhowmik

Bobda

2021

Sensors

View full text Add to dashboard Cite

The astounding development of optical sensing imaging technology, coupled with the impressive improvements in machine learning algorithms, has increased our ability to understand and extract information from scenic events. In most cases, Convolution neural networks (CNNs) are largely adopted to infer knowledge due to their surprising success in automation, surveillance, and many other application domains. However, the convolution operations’ overwhelming computation demand has somewhat limited their use in remote sensing edge devices. In these platforms, real-time processing remains a challenging task due to the tight constraints on resources and power. Here, the transfer and processing of non-relevant image pixels act as a bottleneck on the entire system. It is possible to overcome this bottleneck by exploiting the high bandwidth available at the sensor interface by designing a CNN inference architecture near the sensor. This paper presents an attention-based pixel processing architecture to facilitate the CNN inference near the image sensor. We propose an efficient computation method to reduce the dynamic power by decreasing the overall computation of the convolution operations. The proposed method reduces redundancies by using a hierarchical optimization approach. The approach minimizes power consumption for convolution operations by exploiting the Spatio-temporal redundancies found in the incoming feature maps and performs computations only on selected regions based on their relevance score. The proposed design addresses problems related to the mapping of computations onto an array of processing elements (PEs) and introduces a suitable network structure for communication. The PEs are highly optimized to provide low latency and power for CNN applications. While designing the model, we exploit the concepts of biological vision systems to reduce computation and energy. We prototype the model in a Virtex UltraScale+ FPGA and implement it in Application Specific Integrated Circuit (ASIC) using the TSMC 90nm technology library. The results suggest that the proposed architecture significantly reduces dynamic power consumption and achieves high-speed up surpassing existing embedded processors’ computational capabilities.

show abstract

“…Many real world applications such as robotics, self-driving cars, augmented reality, video surveillance, mobile-apps and smart city application [38]- [40] require IoT devices capable of AI inference. Thus, DNN inference has also been demonstrated on various embedded System-onChips (SoC) such as Nvidia Tegra, Samsung Exynos, as well as application specific FPGA designs (ESE [34], SCNN [41], [42], [43]), and ASICs such as GoogleTPU and Movidius-NCS, which is used later in our experiment. Except FPGAs, most of these devices are generalized to work with majority of DNN architectures.…”

Section: Introductionmentioning

confidence: 99%

Cluster Pruning: An Efficient Filter Pruning Method for Edge AI Vision Applications

Gamanayake

Jayasinghe

et al. 2020

IEEE J. Sel. Top. Signal Process.

View full text Add to dashboard Cite

Even though the Convolutional Neural Networks (CNN) has shown superior results in the field of computer vision, it is still a challenging task to implement computer vision algorithms in real-time at the edge, especially using a low-cost IoT device due to high memory consumption and computation complexities in a CNN. Network compression methodologies such as weight pruning, filter pruning, and quantization are used to overcome the above mentioned problem. Even though filter pruning methodology has shown better performances compared to other techniques, irregularity of the number of filters pruned across different layers of a CNN might not comply with majority of the neural computing hardware architectures. In this paper, a novel greedy approach called cluster pruning has been proposed, which provides a structured way of removing filters in a CNN by considering the importance of filters and the underlying hardware architecture. The proposed methodology is compared with the conventional filter pruning algorithm on Pascal-VOC open dataset, and Head-Counting dataset, which is our own dataset developed to detect and count people entering a room. We benchmark our proposed method on three hardware architectures, namely CPU, GPU, and Intel Movidius Neural Computer Stick (NCS) using the popular SSD-MobileNet and SSD-SqueezeNet neural network architectures used for edge-AI vision applications. Results demonstrate that our method outperforms the conventional filter pruning methodology, using both datasets on above mentioned hardware architectures. Furthermore, a low cost IoT hardware setup consisting of an Intel Movidius-NCS is proposed to deploy an edge-AI application using our proposed pruning methodology.

show abstract

Reducing Dynamic Power in Streaming CNN Hardware Accelerators by Exploiting Computational Redundancies

Cited by 10 publications

References 24 publications

Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey

Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey

Towards an Efficient CNN Inference Architecture Enabling In-Sensor Processing

Cluster Pruning: An Efficient Filter Pruning Method for Edge AI Vision Applications

Contact Info

Product

Resources

About