2021
DOI: 10.3390/s21082637
|View full text |Cite
|
Sign up to set email alerts
|

A Heterogeneous Hardware Accelerator for Image Classification in Embedded Systems

Abstract: Convolutional neural networks (CNN) have been extensively employed for image classification due to their high accuracy. However, inference is a computationally-intensive process that often requires hardware acceleration to operate in real time. For mobile devices, the power consumption of graphics processors (GPUs) is frequently prohibitive, and field-programmable gate arrays (FPGA) become a solution to perform inference at high speed. Although previous works have implemented CNN inference on FPGAs, their high… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(6 citation statements)
references
References 46 publications
(66 reference statements)
0
4
0
Order By: Relevance
“…Lyu et al [18] implemented a YOLOv4 model on the FPGA platform to identify citrus flowers; the computing speed of this model was approximately 33.3 ms, and the power consumption of the FPGA was 20 W. Pérez et al [19] used an FPGA with a CNN model for image classification and achieved a speed of 24.6 frames per second. Previous studies indicate that the FPGA can accelerate AI computations.…”
Section: Conclusion and Discussionmentioning
confidence: 99%
“…Lyu et al [18] implemented a YOLOv4 model on the FPGA platform to identify citrus flowers; the computing speed of this model was approximately 33.3 ms, and the power consumption of the FPGA was 20 W. Pérez et al [19] used an FPGA with a CNN model for image classification and achieved a speed of 24.6 frames per second. Previous studies indicate that the FPGA can accelerate AI computations.…”
Section: Conclusion and Discussionmentioning
confidence: 99%
“…The advantage of this design is that the same computing engine can be used repeatedly for multiple accelerators without reconfiguration. There have been many studies [21][22] using this engine architecture to achieve excellent performances on low-density FPGAs. For example, Wu et al [21] proposed a hardware accelerator that efficiently supported various convolutional variants to address the utilization degradation problem of these emerging operators in general-purpose convolution engines, in which each layer was modularized to improve the implementation efficiency of various lightweight CNN models.…”
Section: ) Separated Engine Architecturementioning
confidence: 99%
“…For example, Wu et al [21] proposed a hardware accelerator that efficiently supported various convolutional variants to address the utilization degradation problem of these emerging operators in general-purpose convolution engines, in which each layer was modularized to improve the implementation efficiency of various lightweight CNN models. Ignacio et al [22] proposed a scalable accelerator architecture using acceleration schemes such as bank-balanced pruning, dynamic quantization and input channel parallelism, and output channel parallelism. Finally, an analytical model was proposed for complex design spaces, which was used to guide the search for optimal parameter configurations.…”
Section: ) Separated Engine Architecturementioning
confidence: 99%
“…Many recent techniques have been used to improve the acceleration performance of CNNs on FPGAs [20][21][22]. FPGAs can achieve a moderate performance with lower power consumption.…”
Section: Related Workmentioning
confidence: 99%