2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC) 2018
DOI: 10.1109/dac.2018.8465940
|View full text |Cite
|
Sign up to set email alerts
|

Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

Abstract: CPU-FPGA heterogeneous architectures are attracting ever-increasing attention in an attempt to advance computational capabilities and energy efficiency in today's datacenters. These architectures provide programmers with the ability to reprogram the FPGAs for flexible acceleration of many workloads. Nonetheless, this advantage is often overshadowed by the poor programmability of FPGAs whose programming is conventionally a RTL design practice. Although recent advances in high-level synthesis (HLS) significantly… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
22
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
7
1

Relationship

3
5

Authors

Journals

citations
Cited by 21 publications
(22 citation statements)
references
References 24 publications
0
22
0
Order By: Relevance
“…Although several works have been proposed [6,9,[41][42][43]] to automatically explore a large set of FPGA designs, they all assume the initial design to be fully cached or cacheable. For the sake of conciseness, in this Section we only focus on approaches exploring directive-insertion optimizations that, as demonstrated by Siracusa et al [31], are of particular interest in mixed optimization methodologies.…”
Section: Related Workmentioning
confidence: 99%
“…Although several works have been proposed [6,9,[41][42][43]] to automatically explore a large set of FPGA designs, they all assume the initial design to be fully cached or cacheable. For the sake of conciseness, in this Section we only focus on approaches exploring directive-insertion optimizations that, as demonstrated by Siracusa et al [31], are of particular interest in mixed optimization methodologies.…”
Section: Related Workmentioning
confidence: 99%
“…General HLS compilers -Beyond generating systolic arrays, there is also a plethora of work targeting implementing general applications on FPGAs [3,6,16,21,34]. However, experimental results show that there still exists a performance gap between such frameworks and dedicated systolic array compilers like SuSy.…”
Section: Related Workmentioning
confidence: 99%
“…Figure 1 illustrates our new contributions, highlighted with bold and red, relative to the prior HLS literature. There exists many automated approaches for generating device and host interfaces [20,61,83], exploring parallelization opportunities [24,34,46,83…”
Section: Heterorefactor Workflowmentioning
confidence: 99%
“…The kernels we selected are slightly slower than running on CPU because I6 and I7 in Rosetta are designed to achieve higher energy efficiency but not higher processing throughput compared to CPU [84]. HeteroRefactor aims to reduce resource usage, while prior work [19,24] achieves higher performance than CPU by leveraging more on-chip resources to achieve parallelism. HeteroRefactor could be used jointly with other tools to produce fast and resource-efficient FPGA accelerators.…”
Section: Overhead and Performancementioning
confidence: 99%
See 1 more Smart Citation