2023
DOI: 10.1145/3570928
|View full text |Cite
|
Sign up to set email alerts
|

FlexCNN: An End-to-end Framework for Composing CNN Accelerators on FPGA

Abstract: With reduced data reuse and parallelism, recent convolutional neural networks (CNNs) create new challenges for FPGA acceleration. Systolic arrays (SAs) are efficient, scalable architectures for convolutional layers, but without proper optimizations, their efficiency drops dramatically for reasons: 1) the different dimensions within same-type layers, 2) the different convolution layers especially transposed and dilated convolutions, and 3) CNN’s complex dataflow graph. Furthermore, significant overheads arise w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

1
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(3 citation statements)
references
References 38 publications
1
2
0
Order By: Relevance
“…Another accelerator in a different study [16] operated at a frequency of 1 GHz, consuming 512 multipliers, with a theoretical throughput of 498.6 GOPS, but the actual throughput only accounted for 48.69% of the theoretical value. Similar phenomena have also been reflected in other research [17][18][19][20]. In essence, low actual throughput reflects a low utilization efficiency of the multiplier.…”
Section: Introductionsupporting
confidence: 83%
“…Another accelerator in a different study [16] operated at a frequency of 1 GHz, consuming 512 multipliers, with a theoretical throughput of 498.6 GOPS, but the actual throughput only accounted for 48.69% of the theoretical value. Similar phenomena have also been reflected in other research [17][18][19][20]. In essence, low actual throughput reflects a low utilization efficiency of the multiplier.…”
Section: Introductionsupporting
confidence: 83%
“…Compared to Angel-eye [47], we use similar LUT resources and achieve similar performance, but our DSP usage is significantly reduced and the overall computational resource efficiency is improved by 8.51%. While we may not possess a performance advantage compared to Caffeine [48] and FlexCNN [49], our work uses far fewer resources. In fact, we demonstrate a resource efficiency improvement of 15.16% and 19.80% compared to Caffeine [48] and FlexCNN [49], respectively.…”
Section: Comparison With Related Workmentioning
confidence: 97%
“…While we may not possess a performance advantage compared to Caffeine [48] and FlexCNN [49], our work uses far fewer resources. In fact, we demonstrate a resource efficiency improvement of 15.16% and 19.80% compared to Caffeine [48] and FlexCNN [49], respectively. Furthermore, given that Xilinx's Vitis AI tool employs 8-bit quantization, the Xilinx B4096 DPU [34,50] exhibits reduced LUT resource consumption.…”
Section: Comparison With Related Workmentioning
confidence: 97%