2017
DOI: 10.1145/3155290
|View full text |Cite
|
Sign up to set email alerts
|

Multidimensional Intratile Parallelization for Memory-Starved Stencil Computations

Abstract: Optimizing the performance of stencil algorithms has been the subject of intense research over the last two decades. Since many stencil schemes have low arithmetic intensity, most optimizations focus on increasing the temporal data access locality, thus reducing the data traffic through the main memory interface with the ultimate goal of decoupling from this bottleneck. There are, however, only few approaches that explicitly leverage the shared cache feature of modern multicore chips. If every thread works on … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
34
0
1

Year Published

2018
2018
2021
2021

Publication Types

Select...
3
2
2

Relationship

3
4

Authors

Journals

citations
Cited by 29 publications
(36 citation statements)
references
References 46 publications
1
34
0
1
Order By: Relevance
“…For example, seismic [63], stencil [64], [65], electromagnetic [66], molecular dynamics [67], Fast Multipole Methods [68], tensors [39], deep learning [69], [70], databases [49], [71], [72], big data [73], systems and graph engines [74], and many more.…”
Section: State-of-the-art Shared-memory Optimizationsmentioning
confidence: 99%
“…For example, seismic [63], stencil [64], [65], electromagnetic [66], molecular dynamics [67], Fast Multipole Methods [68], tensors [39], deep learning [69], [70], databases [49], [71], [72], big data [73], systems and graph engines [74], and many more.…”
Section: State-of-the-art Shared-memory Optimizationsmentioning
confidence: 99%
“…Such stencil codes can, in the context of iterative solvers, significantly benefit from careful tuning such as diamond tiling. However, most tunings require invariant stencils in order to perform [37]. Our work targets problems where stencil entries are not constant.…”
Section: Discussionmentioning
confidence: 99%
“…The THIIM stencil requires many bytes per grid cell, which makes it challenging to fit sufficiently large tiles in the cache memory. We have introduced a more advanced cache block sharing technique in [22], where we propose multi-dimensional intratile parallelization to achieve a further reduction in the tile size requirements and maintain architecture-friendly memory access patterns.…”
Section: A Backgroundmentioning
confidence: 99%
“…Our experiments require a system that allows full control over the tunable parameters of a temporally blocked stencil algorithm. The open source system provided by Malas et al [17], [22], called Girih, provides these options. Girih uses wavefront-diamond tiling with multi-dimensional intratile parallelization to construct a Multi-threaded Wavefront Diamond blocking (MWD) approach.…”
Section: A Backgroundmentioning
confidence: 99%
See 1 more Smart Citation