2020
DOI: 10.1109/access.2020.3000009
|View full text |Cite
|
Sign up to set email alerts
|

A Novel FPGA Accelerator Design for Real-Time and Ultra-Low Power Deep Convolutional Neural Networks Compared With Titan X GPU

Abstract: Convolutional neural networks (CNNs) based deep learning algorithms require high data flow and computational intensity. For real-time industrial applications, they need to overcome challenges such as high data bandwidth requirement and power consumption on hardware platforms. In this work, we have analyzed in detail the data dependency in the CNN accelerator and propose specific pipelined operations and data organized manner to design a high throughput CNN accelerator on FPGA. Besides, we have optimized the ke… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
14
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 41 publications
(14 citation statements)
references
References 30 publications
0
14
0
Order By: Relevance
“…The flexibility of the FPGA hardware is used to create a dynamic model generation system based on the dataset using different softcore processors. Further evidence of FPGA-based deep learning acceleration is reported in [11][12][13][14][15]. As stated in [11], the authors used an FPGA to increase the speed of stochastic gradient descent in matrix factorization operations.…”
Section: Introductionmentioning
confidence: 92%
See 1 more Smart Citation
“…The flexibility of the FPGA hardware is used to create a dynamic model generation system based on the dataset using different softcore processors. Further evidence of FPGA-based deep learning acceleration is reported in [11][12][13][14][15]. As stated in [11], the authors used an FPGA to increase the speed of stochastic gradient descent in matrix factorization operations.…”
Section: Introductionmentioning
confidence: 92%
“…The FPGA-based solution offered a 15.3× speed-up over the GPU implementation with a 60× less data dependency reduction. The work reported in [12][13][14][15] achieved greater performance in object detection inference [15] and a reduction in overall MAC operations per layer [14]. Alternatively, work performed in [12] uses an FPGA to handle the parallel context of echo data from received laser signals at high speed by using deep learning.…”
Section: Introductionmentioning
confidence: 99%
“…Implementing CNN accelerator efficiently on FPGA is challenging, as most of the workload is focused in the heavy and repetitive convolution layers of CNN. Most recent works ( [28], [29], [30]) revolve around optimizing the CNN loop (Refer to Equation 1). Such techniques study the effect of unrolling at different loop level, loop tiling and loop interchanging.…”
Section: E Related Workmentioning
confidence: 99%
“…Lenet, Alexnet and VGGNet are the most popular CNNs used in the FPGA implementation. However, the power consumptions, in general, are compared with either processor, GPU, or PC implementations, which is not a fair comparison [16], [18], [19]. Since FPGAs are inherently energy-efficient devices, a fair comparison should be done between FPGA implementations.…”
Section: Introductionmentioning
confidence: 99%