Proceedings of the 19th Annual International Conference on Supercomputing 2005
DOI: 10.1145/1088149.1088197
|View full text |Cite
|
Sign up to set email alerts
|

Cache oblivious stencil computations

Abstract: We present a cache oblivious algorithm for stencil computations, which arise for example in finite-difference methods. Our algorithm applies to arbitrary stencils in n-dimensional spaces. On an "ideal cache" of size Z, our algorithm saves a factor of Θ(Z 1/n ) cache misses compared to a naive algorithm, and it exploits temporal locality optimally throughout the entire memory hierarchy.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
142
0

Year Published

2008
2008
2021
2021

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 153 publications
(144 citation statements)
references
References 3 publications
(2 reference statements)
2
142
0
Order By: Relevance
“…Secondly, the low computationalintensity and reuse ratios. After gathering the set of data points, just one central point is computed and only the accessed data points in the sweep direction might be reused for the computation of the next central point [4].…”
Section: Boosting Numerical Codesmentioning
confidence: 99%
“…Secondly, the low computationalintensity and reuse ratios. After gathering the set of data points, just one central point is computed and only the accessed data points in the sweep direction might be reused for the computation of the next central point [4].…”
Section: Boosting Numerical Codesmentioning
confidence: 99%
“…Periodic domains have also been tiled using rhombus shaped tiles [5,8,10]. These tiles overlap at their bases, causing two tiles to compute some duplicate results.…”
Section: Related Workmentioning
confidence: 99%
“…These benefits can be enough to overcome the extra time spent recomputing a portion of the iterations. Overlapping tiles can also be used to handle wraparound dependencies that are introduced due to periodic boundaries [5,8]. The presented techniques could feasibly be extended to handle the ring, cylinder, and torus domains.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Tiling [19] [29] is often used to partition the stencil loops among multiple processing elements (PEs) for parallel execution, and we refer to a workload partition as a tile in this paper. Similar tiling techniques also help localize computation to optimize cache hit rate for an individual processor [13]. Tiling across multiple PEs introduces a problem because stencils along the boundary of a tile must obtain values that were computed remotely on other PEs, as shown in Figure 1(a).…”
Section: Introductionmentioning
confidence: 99%