Proceedings of the 26th ACM International Conference on Supercomputing 2012
DOI: 10.1145/2304576.2304619
|View full text |Cite
|
Sign up to set email alerts
|

High-performance code generation for stencil computations on GPU architectures

Abstract: Stencil computations arise in many scientific computing domains, and often represent time-critical portions of applications. There is significant interest in offloading these computations to high-performance devices such as GPU accelerators, but these architectures offer challenges for developers and compilers alike. Stencil computations in particular require careful attention to off-chip memory access and the balancing of work among compute units in GPU devices.In this paper, we present a code generation sche… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
156
0
2

Year Published

2013
2013
2017
2017

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 207 publications
(158 citation statements)
references
References 16 publications
(38 reference statements)
0
156
0
2
Order By: Relevance
“…Even though mappings alternative to row-or column-major can produce a more favorable memory access stream for some applications without CMS [27,55,32,67], the baseline still cannot outperform CMS even with complex data layout transformations. CMS will also benefit communication-avoiding optimizations which tradeoff reduced memory traffic for redundant computation [51,28,18]. CMS can increase performance without the extra local storage, cache space, or computations needed for redundant communication, while better alleviating network congestion and reducing memory power.…”
Section: Discussionmentioning
confidence: 99%
See 4 more Smart Citations
“…Even though mappings alternative to row-or column-major can produce a more favorable memory access stream for some applications without CMS [27,55,32,67], the baseline still cannot outperform CMS even with complex data layout transformations. CMS will also benefit communication-avoiding optimizations which tradeoff reduced memory traffic for redundant computation [51,28,18]. CMS can increase performance without the extra local storage, cache space, or computations needed for redundant communication, while better alleviating network congestion and reducing memory power.…”
Section: Discussionmentioning
confidence: 99%
“…However, processors may block and wait for others to become ready. Because barrier calls are typical in computation loops [28], synchronous reads introduce no additional waiting and can replace barrier calls.…”
Section: Read Operationsmentioning
confidence: 99%
See 3 more Smart Citations