Automatically Optimizing the Latency, Area, and Accuracy of C Programs for High-Level Synthesis

Gao, Xitong; Wickerson, John; Constantinides, George A.

doi:10.1145/2847263.2847282

Cited by 13 publications

(8 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Automated source-to-source transformations can result in descriptions that might not exactly match the original code. Gao et al [13] and Cong et al [14] have done similar research.…”

Section: Related Work a Qor Improvements In Hls-based Designmentioning

confidence: 63%

Module-per-Object: A Human-Driven Methodology for C++-Based High-Level Synthesis Design

Silva

Boyer

Langlois

2019

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

View full text Add to dashboard Cite

High-Level Synthesis (HLS) brings FPGAs to audiences previously unfamiliar to hardware design. However, achieving the highest Quality-of-Results (QoR) with HLS is still unattainable for most programmers. This requires detailed knowledge of FPGA architecture and hardware design in order to produce FPGA-friendly codes. Moreover, these codes are normally in conflict with best coding practices, which favor code reuse, modularity, and conciseness.To overcome these limitations, we propose Module-per-Object (MpO), a human-driven HLS design methodology intended for both hardware designers and software developers with limited FPGA expertise. MpO exploits modern C++ to raise the abstraction level while improving QoR, code readability and modularity. To guide HLS designers, we present the five characteristics of MpO classes. Each characteristic exploits the power of HLS-supported modern C++ features to build C++based hardware modules. These characteristics lead to highquality software descriptions and efficient hardware generation. We also present a use case of MpO, where we use C++ as the intermediate language for FPGA-targeted code generation from P4, a packet processing domain specific language. The MpO methodology is evaluated using three design experiments: a packet parser, a flow-based traffic manager, and a digital upconverter. Based on experiments, we show that MpO can be comparable to hand-written VHDL code while keeping a high abstraction level, human-readable coding style and modularity. Compared to traditional C-based HLS design, MpO leads to more efficient circuit generation, both in terms of performance and resource utilization. Also, the MpO approach notably improves software quality, augmenting parameterization while eliminating the incidence of code duplication.

show abstract

“…Automated source-to-source transformations can result in descriptions that might not exactly match the original code. Gao et al [13] and Cong et al [14] have done similar research.…”

Section: Related Work a Qor Improvements In Hls-based Designmentioning

confidence: 63%

Module-per-Object: A Human-Driven Methodology for C++-Based High-Level Synthesis Design

Silva

Boyer

Langlois

2019

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

View full text Add to dashboard Cite

show abstract

“…In general, performance analysis is mainly performed at either IR level [32,36,28,20,18] or source code level [37]. Since most of the existing work performs analysis without explicitly considering back-end design flow [32,36,18,28,20], their analysis cannot reflect the optimization done by the commercial tool. On the other hand, similar to this paper, [37] builds the performance model with the help of the commercial tool, but [37] provides neither the resource model nor automated code transformation, so users still need to manually change the kernel code while considering the FPGA resource limitation.…”

Section: Related Workmentioning

confidence: 99%

“…Finally, some frameworks also focus on general-purpose programming languages such as C/C++ [21,35,18]. SOAP3 [18] is a framework that analyzes a kernel at the metasemantic intermediate representation (MIR) graph level and transforms it according to the result of design space exploration. However, SOAP3 adopts regression models for resource estimation, so the model is not general enough to cover nonlinear resource consumption.…”

Section: Related Workmentioning

confidence: 99%

“…Unlike most existing models [18,20,28,32,36] that analyze the source program directly, many parameters of our proposed model are obtained from the HLS synthesis reports of a few design points. This feature enables our model to capture most scheduling optimizations performed by the HLS tool.…”

Section: Analytical Modelmentioning

confidence: 99%

“…Analytical Modeling: Fast performance estimation on FP-GAs has become popular in recent years. In general, performance analysis is mainly performed at either IR level [32,36,28,20,18] or source code level [37]. Since most of the existing work performs analysis without explicitly considering back-end design flow [32,36,18,28,20], their analysis cannot reflect the optimization done by the commercial tool.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

Cong

Wei

et al. 2018

2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)

View full text Add to dashboard Cite

CPU-FPGA heterogeneous architectures are attracting ever-increasing attention in an attempt to advance computational capabilities and energy efficiency in today's datacenters. These architectures provide programmers with the ability to reprogram the FPGAs for flexible acceleration of many workloads. Nonetheless, this advantage is often overshadowed by the poor programmability of FPGAs whose programming is conventionally a RTL design practice. Although recent advances in high-level synthesis (HLS) significantly improve the FPGA programmability, it still leaves programmers facing the challenge of identifying the optimal design configuration in a tremendous design space.This paper aims to address this challenge and pave the path from software programs towards high-quality FPGA accelerators. Specifically, we first propose the composable, parallel and pipeline (CPP) microarchitecture as a template of accelerator designs. Such a well-defined template is able to support efficient accelerator designs for a broad class of computation kernels, and more importantly, drastically reduce the design space. Also, we introduce an analytical model to capture the performance and resource trade-offs among different design configurations of the CPP microarchitecture, which lays the foundation for fast design space exploration. On top of the CPP microarchitecture and its analytical model, we develop the AutoAccel framework to make the entire accelerator generation automated. AutoAccel accepts a software program as an input and performs a series of code transformations based on the result of the analytical-model-based design space exploration to construct the desired CPP microarchitecture. Our experiments show that the AutoAccel-generated accelerators outperform their corresponding software implementations by an average of 72x for a broad class of computation kernels.

show abstract

A Survey on Performance Optimization of High-Level Synthesis Tools

Huang

Wang

et al. 2020

J. Comput. Sci. Technol.

View full text Add to dashboard Cite

Automatically Optimizing the Latency, Area, and Accuracy of C Programs for High-Level Synthesis

Cited by 13 publications

References 23 publications

Module-per-Object: A Human-Driven Methodology for C++-Based High-Level Synthesis Design

Module-per-Object: A Human-Driven Methodology for C++-Based High-Level Synthesis Design

Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

A Survey on Performance Optimization of High-Level Synthesis Tools

Contact Info

Product

Resources

About