A Heterogeneous Hardware Accelerator for Image Classification in Embedded Systems

Pérez, Ignacio Castiñeíras; Figueroa, Miguel

doi:10.3390/s21082637

Cited by 16 publications

(6 citation statements)

References 46 publications

(66 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Lyu et al [18] implemented a YOLOv4 model on the FPGA platform to identify citrus flowers; the computing speed of this model was approximately 33.3 ms, and the power consumption of the FPGA was 20 W. Pérez et al [19] used an FPGA with a CNN model for image classification and achieved a speed of 24.6 frames per second. Previous studies indicate that the FPGA can accelerate AI computations.…”

Section: Conclusion and Discussionmentioning

confidence: 99%

Real-Time Information Fusion System Implementation Based on ARM-Based FPGA

et al. 2023

View full text Add to dashboard Cite

In this study, an information fusion system displayed fusion information on a transparent display by considering the relationships among the display, background exhibit, and user’s gaze direction. We used an ARM-based field-programmable gate array (FPGA) to perform virtual–real fusion of this system as well as evaluated the virtual–real fusion execution speed. The ARM-based FPGA used Intel® RealsenseTM D435i depth cameras to capture depth and color images of an observer and exhibit. The image data was received by the ARM side and fed to the FPGA side for real-time object detection. The FPGA accelerated the computation of the convolution neural networks to recognize observers and exhibits. In addition, a module performed by the FPGA was developed for rapid registration between the color and depth images. The module calculated the size and position of the information displayed on a transparent display according to the pixel coordinates and depth values of the human eye and exhibit. A personal computer with GPU RTX2060 performed information fusion in ~47 ms, whereas the ARM-based FPGA accomplished it in 25 ms. Thus, the fusion speed of the ARM-based FPGA was 1.8 times faster than on the computer.

show abstract

Section: Conclusion and Discussionmentioning

confidence: 99%

Real-Time Information Fusion System Implementation Based on ARM-Based FPGA

et al. 2023

View full text Add to dashboard Cite

show abstract

“…The advantage of this design is that the same computing engine can be used repeatedly for multiple accelerators without reconfiguration. There have been many studies [21][22] using this engine architecture to achieve excellent performances on low-density FPGAs. For example, Wu et al [21] proposed a hardware accelerator that efficiently supported various convolutional variants to address the utilization degradation problem of these emerging operators in general-purpose convolution engines, in which each layer was modularized to improve the implementation efficiency of various lightweight CNN models.…”

Section: ) Separated Engine Architecturementioning

confidence: 99%

“…For example, Wu et al [21] proposed a hardware accelerator that efficiently supported various convolutional variants to address the utilization degradation problem of these emerging operators in general-purpose convolution engines, in which each layer was modularized to improve the implementation efficiency of various lightweight CNN models. Ignacio et al [22] proposed a scalable accelerator architecture using acceleration schemes such as bank-balanced pruning, dynamic quantization and input channel parallelism, and output channel parallelism. Finally, an analytical model was proposed for complex design spaces, which was used to guide the search for optimal parameter configurations.…”

Section: ) Separated Engine Architecturementioning

confidence: 99%

A real-time and high-performance MobileNet accelerator based on adaptive dataflow scheduling for image classification

Sang,

Ruan,

et al. 2023

J Real-Time Image Proc

View full text Add to dashboard Cite

Convolutional neural network (CNN) models equipped with depth separable convolution (DSC) promise a lower spatial complexity while retaining high model accuracy. However, little attention has been paid to their hardware architecture. Previous studies on DSC-based CNN accelerators typically use fixed computational models for various models, leading to an imbalance between power, efficiency, and performance. To address this problem, a novel, real-time DSCbased CNN accelerator that can accommodate fieldprogrammable gate arrays (FPGAs) of different capacities and CNNs of different sizes is proposed in this paper. Attractively, a dynamically reconfigurable computing engine and blockconvolution-based adaptive dataflow scheduling mode strike a trade-off between hardware resources and the processing speed in industrial processes. The proposed MobileNet accelerator was implemented and evaluated on the Xilinx XC7020 platform. Compared to previous FPGA-based accelerators, the experimental results showed that our approach can provide a frame rate of 29.4 FPS for full HD RGB images, meeting the needs of real industrial real-time applications.

show abstract

“…Many recent techniques have been used to improve the acceleration performance of CNNs on FPGAs [20][21][22]. FPGAs can achieve a moderate performance with lower power consumption.…”

Section: Related Workmentioning

confidence: 99%

An Efficient FPGA-Based Convolutional Neural Network for Classification: Ad-MobileNet

et al. 2021

View full text Add to dashboard Cite

Convolutional Neural Networks (CNN) continue to dominate research in the area of hardware acceleration using Field Programmable Gate Arrays (FPGA), proving its effectiveness in a variety of computer vision applications such as object segmentation, image classification, face detection, and traffic signs recognition, among others. However, there are numerous constraints for deploying CNNs on FPGA, including limited on-chip memory, CNN size, and configuration parameters. This paper introduces Ad-MobileNet, an advanced CNN model inspired by the baseline MobileNet model. The proposed model uses an Ad-depth engine, which is an improved version of the depth-wise separable convolution unit. Moreover, we propose an FPGA-based implementation model that supports the Mish, TanhExp, and ReLU activation functions. The experimental results using the CIFAR-10 dataset show that our Ad-MobileNet has a classification accuracy of 88.76% while requiring little computational hardware resources. Compared to state-of-the-art methods, our proposed method has a fairly high recognition rate while using fewer computational hardware resources. Indeed, the proposed model helps to reduce hardware resources by more than 41% compared to that of the baseline model.

show abstract

A Heterogeneous Hardware Accelerator for Image Classification in Embedded Systems

Cited by 16 publications

References 46 publications

Real-Time Information Fusion System Implementation Based on ARM-Based FPGA

Real-Time Information Fusion System Implementation Based on ARM-Based FPGA

A real-time and high-performance MobileNet accelerator based on adaptive dataflow scheduling for image classification

An Efficient FPGA-Based Convolutional Neural Network for Classification: Ad-MobileNet

Contact Info

Product

Resources

About