2016
DOI: 10.1002/cpe.3850
|View full text |Cite
|
Sign up to set email alerts
|

FPGA‐accelerated deep convolutional neural networks for high throughput and energy efficiency

Abstract: SUMMARYRecent breakthroughs in the deep convolutional neural networks (CNNs) have led to great improvements in the accuracy of both vision and auditory systems. Characterized by their deep structures and large numbers of parameters, deep CNNs challenge the computational performance of today. Hardware specialization in the form of field-programmable gate array offers a promising path towards major leaps in computational performance while achieving high-energy efficiency.In this paper, we focus on accelerating d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
23
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 45 publications
(23 citation statements)
references
References 20 publications
0
23
0
Order By: Relevance
“…Focusing on visual tasks-oriented proposals, those based on FPGAs stand out in terms of energy efficiency but not in performance [19], [20]. Additionally, some of them are supported in other elements as CPUs or external DRAM memory [21]- [24], the networks used as benchmark are not always representative [25], [26] or their price limits the application range [27], [28].…”
Section: Hardware Implementationmentioning
confidence: 99%
“…Focusing on visual tasks-oriented proposals, those based on FPGAs stand out in terms of energy efficiency but not in performance [19], [20]. Additionally, some of them are supported in other elements as CPUs or external DRAM memory [21]- [24], the networks used as benchmark are not always representative [25], [26] or their price limits the application range [27], [28].…”
Section: Hardware Implementationmentioning
confidence: 99%
“…However, the solution introduces a large overhead associated with the memory accesses and execution times necessary to rearrange the input maps. This overhead was partially eliminated in [15] using an accelerator for matrix multiplication and dedicated units to convert the inputs maps into a matrix.…”
Section: Related Workmentioning
confidence: 99%
“…Due to the high computational complexity of the convolutional layer, prior work has addressed parallelism of the computation by unrolling the 2D convolution to matrix multiplication [12] or reducing the number of operations using Fast Fourier Transform [10]. However, parallelization by unrolling encounters a bottleneck due to limited on-chip memory of FPGAs.…”
Section: Introductionmentioning
confidence: 99%