2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2018
DOI: 10.1109/ipdpsw.2018.00027
|View full text |Cite
|
Sign up to set email alerts
|

High-Performance High-Order Stencil Computation on FPGAs Using OpenCL

Abstract: In this paper we evaluate the performance of FPGAs for high-order stencil computation using High-Level Synthesis. We show that despite the higher computation intensity and onchip memory requirement of such stencils compared to first-order ones, our design technique with combined spatial and temporal blocking remains effective. This allows us to reach similar, or even higher, compute performance compared to first-order stencils. We use an OpenCL-based design that, apart from parameterizing performance knobs, al… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
15
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 17 publications
(17 citation statements)
references
References 20 publications
2
15
0
Order By: Relevance
“…Similarly, 1.5D spatial blocking can be used for 2D stencils. This blocking technique has been widely employed on different devices [17,23,40,41], with not just two, but also more combined time-steps. In 2.5D spatial blocking, the 2D tiles are streamed over one dimension and data of each tile is effectively reused for updating the next tiles.…”
Section: Temporal Blockingmentioning
confidence: 99%
See 1 more Smart Citation
“…Similarly, 1.5D spatial blocking can be used for 2D stencils. This blocking technique has been widely employed on different devices [17,23,40,41], with not just two, but also more combined time-steps. In 2.5D spatial blocking, the 2D tiles are streamed over one dimension and data of each tile is effectively reused for updating the next tiles.…”
Section: Temporal Blockingmentioning
confidence: 99%
“…LIFT [11] is a functional data-parallel programming language that allows expressing stencil loops as a set of reusable parallel primitives and optimizing them. Recently, multiple implementations of N.5D blocking on FPGAs have also been proposed with FPGA-specific optimizations [4,5,40,41]. FPGAs tend to achieve better scaling with temporal blocking compared to GPUs due to higher flexibility of employing their on-chip memory which allows larger spatial block sizes compared to GPUs.…”
Section: Related Workmentioning
confidence: 99%
“…A halo size of zero results in simple sequential reading/writing with no overlapping. The second and third classes implement 1.5D and 2.5D overlapped spatial blocking, respectively, that are widely used in 2D and 3D stencil computation [5,6,7,8,9,10]. For the 1.5D class, the x dimension is blocked and memory accesses are streamed row by row until the last index in the y dimension, before moving to the next block ( Fig.…”
Section: A Memory Benchmark Suitementioning
confidence: 99%
“…The 1.5D and 2.5D blocking classes support all the above array configurations except R1W0. All Single Work-item kernels use collapsed loops with the exit condition optimization from [4,5,6] for best timing and hence, are constructed as a doubly-nested loops, with the fully-unrolled innermost loop having a trip count equal to the vector size, and the outer loop having an initiation interval of one. In the NDRange kernels, the workgroups have the same number of dimensions as the input, and memory access coalescing is performed using loop unrolling.…”
Section: Out Of Boundmentioning
confidence: 99%
See 1 more Smart Citation