SC14: International Conference for High Performance Computing, Networking, Storage and Analysis 2014
DOI: 10.1109/sc.2014.21
|View full text |Cite
|
Sign up to set email alerts
|

Scalable Kernel Fusion for Memory-Bound GPU Applications

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
61
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 75 publications
(61 citation statements)
references
References 22 publications
0
61
0
Order By: Relevance
“…They provide basic performance models for the number of stencils to fuse into one tile focusing on (possibly unrolled) kernels that process only one stencil repeatedly and do not consider varying tiling and fusion strategies. Finally, Wahib et al [18] take arbitrary stencil graphs from larger scientific applications and present an analytical performance model for choosing an optimal execution strategy. Even though closely related, they limit themselves to kernel fusion using computation on-the-fly only considering shared memory and apply their work on NVIDIA GPUs only.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…They provide basic performance models for the number of stencils to fuse into one tile focusing on (possibly unrolled) kernels that process only one stencil repeatedly and do not consider varying tiling and fusion strategies. Finally, Wahib et al [18] take arbitrary stencil graphs from larger scientific applications and present an analytical performance model for choosing an optimal execution strategy. Even though closely related, they limit themselves to kernel fusion using computation on-the-fly only considering shared memory and apply their work on NVIDIA GPUs only.…”
Section: Related Workmentioning
confidence: 99%
“…most of these schemes consider the optimization of a single stencil in isolation. Many applications, however, require nested stencils [18] that are applied in succession. The data dependencies of these nestings can form complex directed acyclic stencil graphs where multiple stencils need to be optimized in tandem to achieve highest performance.…”
Section: Introductionmentioning
confidence: 99%
“…The problem is resolved by the OEG generating heuristic that adds precedency for the kernel invoked first by the host. Another optimization is to add redundant instances for arrays having several kernels writing into them to relax dependencies (elaboration on this optimization in previous work [28]). After this stage in the transformation, a report is given to the programmer regarding changes that were done to the original code to optimize the two graphs.…”
Section: Ddg and Oegmentioning
confidence: 99%
“…We build on previous work [28], which took the following steps. First, we formulated the kernel fusion problem as a combinatorial optimization problem.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation