Energy-Efficient Architecture for CNNs Inference on Heterogeneous FPGA

Spagnolo, Fanny; Perri, Stefania; Frustaci, Fabio; Corsonello, Pasquale

doi:10.3390/jlpea10010001

Cited by 16 publications

(9 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The table also includes the comparison with some of the novel FPGA accelerators. The motivation of the authors in [ 19 , 20 , 21 , 38 ], was optimizing a design to obtain higher GOPs, maximize the performance, or reducing power consumption. On the contrary, our focus is to increase the frequency and keep inference engines idle to save dynamic power consumption.…”

Section: Resultsmentioning

confidence: 99%

“…For instance, Spagnolo et al proposed an energy-efficient hardware accelerator for CNN using heterogeneous FPGA. Their system on chip (SoC) architecture is structured to support the efficient Single-Instruction-Multiple-Data (SIMD) paradigm for computing both convolutional and fully connected layers [ 19 ]. Since all computations are applied on the FPGA and controlled by an embedded processor, they obtained better performance than the GPU.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

HARP: Hierarchical Attention Oriented Region-Based Processing for High-Performance Computation in Vision Sensor

Bhowmik

Pantho

Bobda

2021

Sensors

View full text Add to dashboard Cite

Cameras are widely adopted for high image quality with the rapid advancement of complementary metal-oxide-semiconductor (CMOS) image sensors while offloading vision applications’ computation to the cloud. It raises concern for time-critical applications such as autonomous driving, surveillance, and defense systems since moving pixels from the sensor’s focal plane are expensive. This paper presents a hardware architecture for smart cameras that understands the salient regions from an image frame and then performs high-level inference computation for sensor-level information creation instead of transporting raw pixels. A visual attention-oriented computational strategy helps to filter a significant amount of redundant spatiotemporal data collected at the focal plane. A computationally expensive learning model is then applied to the interesting regions of the image. The hierarchical processing in the pixels’ data path demonstrates a bottom-up architecture with massive parallelism and gives high throughput by exploiting the large bandwidth available at the image source. We prototype the model in field-programmable gate array (FPGA) and application-specific integrated circuit (ASIC) for integrating with a pixel-parallel image sensor. The experiment results show that our approach achieves significant speedup while in certain conditions exhibits up to 45% more energy efficiency with the attention-oriented processing. Although there is an area overhead for inheriting attention-oriented processing, the achieved performance based on energy consumption, latency, and memory utilization overcomes that limitation.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

HARP: Hierarchical Attention Oriented Region-Based Processing for High-Performance Computation in Vision Sensor

Bhowmik

Pantho

Bobda

2021

Sensors

View full text Add to dashboard Cite

show abstract

“…Therefore, we can argue that our architecture outperforms the work in Reference [ 46 ]. Next, we compare our work to an FPGA-based SIMD CNN accelerator design [ 47 ]. The results are shown in Table 6 which indicated performance improvements in our design.…”

Section: Resultsmentioning

confidence: 99%

Towards an Efficient CNN Inference Architecture Enabling In-Sensor Processing

Pantho

Bhowmik

Bobda

2021

Sensors

View full text Add to dashboard Cite

The astounding development of optical sensing imaging technology, coupled with the impressive improvements in machine learning algorithms, has increased our ability to understand and extract information from scenic events. In most cases, Convolution neural networks (CNNs) are largely adopted to infer knowledge due to their surprising success in automation, surveillance, and many other application domains. However, the convolution operations’ overwhelming computation demand has somewhat limited their use in remote sensing edge devices. In these platforms, real-time processing remains a challenging task due to the tight constraints on resources and power. Here, the transfer and processing of non-relevant image pixels act as a bottleneck on the entire system. It is possible to overcome this bottleneck by exploiting the high bandwidth available at the sensor interface by designing a CNN inference architecture near the sensor. This paper presents an attention-based pixel processing architecture to facilitate the CNN inference near the image sensor. We propose an efficient computation method to reduce the dynamic power by decreasing the overall computation of the convolution operations. The proposed method reduces redundancies by using a hierarchical optimization approach. The approach minimizes power consumption for convolution operations by exploiting the Spatio-temporal redundancies found in the incoming feature maps and performs computations only on selected regions based on their relevance score. The proposed design addresses problems related to the mapping of computations onto an array of processing elements (PEs) and introduces a suitable network structure for communication. The PEs are highly optimized to provide low latency and power for CNN applications. While designing the model, we exploit the concepts of biological vision systems to reduce computation and energy. We prototype the model in a Virtex UltraScale+ FPGA and implement it in Application Specific Integrated Circuit (ASIC) using the TSMC 90nm technology library. The results suggest that the proposed architecture significantly reduces dynamic power consumption and achieves high-speed up surpassing existing embedded processors’ computational capabilities.

show abstract

“…The authors in Ref. [105] designed a system architecture based on heterogeneous FPGA with DSPs, supporting SIMD paradigm to efficiently process parallel computation for CNNs layers (Convolution and fully connected layers). The proposed architecture required lower computational time (47%) over non‐SIMD computational implementation.…”

Section: Hw Acceleration Approachesmentioning

confidence: 99%

Accelerating Deep Neural Networks implementation: A survey

Dhouibi

Salem

Saidi

et al. 2021

IET Computers & Digital Tech

View full text Add to dashboard Cite

Recently, Deep Learning (DL) applications are getting more and more involved in different fields. Deploying such Deep Neural Networks (DNN) on embedded devices is still a challenging task considering the massive requirement of computation and storage. Given that the number of operations and parameters increases with the complexity of the model architecture, the performance will strongly depend on the hardware target resources and basically the memory footprint of the accelerator. Recent research studies have discussed the benefit of implementing some complex DL applications based on different models and platforms. However, it is necessary to guarantee the best performance when designing hardware accelerators for DL applications to run at full speed, despite the constraints of low power, high accuracy and throughput. Field Programmable Gate Arrays (FPGAs) are promising platforms for the deployment of large‐scale DNN which seek to reach a balance between the above objectives. Besides, the growing complexity of DL models has made researches think about applying optimization techniques to make them more hardware‐friendly. Herein, DL concept is presented. Then, a detailed description of different optimization techniques used in recent research works is explored. Finally, a survey of research works aiming to accelerate the implementation of DNN models on FPGAs is provided.

show abstract

Energy-Efficient Architecture for CNNs Inference on Heterogeneous FPGA

Cited by 16 publications

References 28 publications

HARP: Hierarchical Attention Oriented Region-Based Processing for High-Performance Computation in Vision Sensor

HARP: Hierarchical Attention Oriented Region-Based Processing for High-Performance Computation in Vision Sensor

Towards an Efficient CNN Inference Architecture Enabling In-Sensor Processing

Accelerating Deep Neural Networks implementation: A survey

Contact Info

Product

Resources

About