Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays 2018
DOI: 10.1145/3174243.3174248
|View full text |Cite
|
Sign up to set email alerts
|

Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL

Abstract: Recent developments in High Level Synthesis tools have attracted software programmers to accelerate their high-performance computing applications on FPGAs. Even though it has been shown that FPGAs can compete with GPUs in terms of performance for stencil computation, most previous work achieve this by avoiding spatial blocking and restricting input dimensions relative to FPGA on-chip memory. In this work we create a stencil accelerator using Intel FPGA SDK for OpenCL that achieves high performance without havi… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

3
65
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 71 publications
(74 citation statements)
references
References 18 publications
3
65
0
Order By: Relevance
“…• Loop collapsing to reduce area overhead of storing variable and buffer states in multiply-nested loops • Padding relative to the degree of temporal parallelism to reduce unaligned accesses caused by overlapped blocking that result in memory bandwidth waste Complete details of our implementation and the performance model we use for parameter tuning are discussed in [8].…”
Section: A Base Implementation For First-order Stencilsmentioning
confidence: 99%
See 2 more Smart Citations
“…• Loop collapsing to reduce area overhead of storing variable and buffer states in multiply-nested loops • Padding relative to the degree of temporal parallelism to reduce unaligned accesses caused by overlapped blocking that result in memory bandwidth waste Complete details of our implementation and the performance model we use for parameter tuning are discussed in [8].…”
Section: A Base Implementation For First-order Stencilsmentioning
confidence: 99%
“…To extend out base implementation from [8] for high-order stencil computation, multiple modifications were required:…”
Section: B Extension For High-order Stencilsmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, a stencil accelerator using Intel OpenCL was presented in the work of Zohouri et al, where the authors highlight two important concerns about the compiler. First, due the Partial Reconfiguration on Arria 10 FPGAs, the fitting and routing quality for OpenCL is reduced on Arria 10.…”
Section: Related Workmentioning
confidence: 99%
“…One way is to focus on application and domain‐specific accelerators: Neural network, Bayesian learning, bioinformatics, stencil computing, energy‐efficient accelerators for graph analytics algorithms, and irregular applications mapping . Another way is to focus on Domain‐Specific Language (DSL) which aims representing parallelism in stream‐based applications, like SPar based on C++ …”
Section: Related Workmentioning
confidence: 99%