2017
DOI: 10.1145/3140659.3080221
|View full text |Cite
|
Sign up to set email alerts
|

Maximizing CNN Accelerator Efficiency Through Resource Partitioning

Abstract: Convolutional neural networks (CNNs) are revolutionizing machine learning, but they present significant computational challenges. Recently, many FPGA-based accelerators have been proposed to improve the performance and efficiency of CNNs. Current approaches construct a single processor that computes the CNN layers one at a time; the processor is optimized to maximize the throughput at which the collection of layers is computed. However, this approach leads to inefficient designs because the same processor stru… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
150
0
1

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 131 publications
(151 citation statements)
references
References 33 publications
0
150
0
1
Order By: Relevance
“…The approach permits the optimization of the architecture for each layer but requires other techniques such as fused layers [17] to account for the extra memory required to store intermediate maps and weights. A mid-term solution was proposed by Shen et al [18]. They mentioned the inefficiencies of a single module to run all convolutional layers, where for some layers there is an under utilization of processing elements.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The approach permits the optimization of the architecture for each layer but requires other techniques such as fused layers [17] to account for the extra memory required to store intermediate maps and weights. A mid-term solution was proposed by Shen et al [18]. They mentioned the inefficiencies of a single module to run all convolutional layers, where for some layers there is an under utilization of processing elements.…”
Section: Related Workmentioning
confidence: 99%
“…The work by Gong et al [19] also proposes a fully pipelined FPGA accelerator for CNNs with 16-bit quantization and a layer-fused technique. The architecture implemented in a small ZYNQ7020 FPGA has an acceptable performance of 80 GOPs, but the complexity of the process referred by Shen et al [18] reduces the efficiency of the solution for small density FPGAs.…”
Section: Related Workmentioning
confidence: 99%
“…However, the fixed dimensions of one computing unit could not be compatible with all the layers with different dimensions, which leads to the resource inefficiency, especially in Fully Connected (FCN) layers [19]. Some recent works [20][21][22][23][24] focus on a parallel streaming architecture, which partitions a system into several independent tasks and runs them in parallel hardware [25]. In general, the partitioning includes task level and data level.…”
Section: Introductionmentioning
confidence: 99%
“…This is the second mapping way. Shen et al [22] and Venieris et al [23] both present a resource partitioning methodology for mapping CNNs on FPGAs. It can be regarded as a trade-off approach between "one size fits all" and "one to one".…”
Section: Introductionmentioning
confidence: 99%
“…As a result, more and more processing power is available to run complex models, such as CNNs, in a reasonable time frame. Furthermore, researchers are working to improve the efficiency of the CNN models (Ioannou et al, 2016;Shen et al, 2016;Zhang et al, 2016a).…”
Section: Discussionmentioning
confidence: 99%