2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) 2015
DOI: 10.1109/cgo.2015.7054196
|View full text |Cite
|
Sign up to set email alerts
|

Locality aware concurrent start for stencil applications

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(4 citation statements)
references
References 17 publications
0
4
0
Order By: Relevance
“…When a thread takes a task, atomics are used to avoid race conditions. This work was extended in [Shrestha et al 2015]. They combined their introduced jagged tiling approach with the diamond tiling extension of the PLUTO framework to allow concurrent start at the inter-and intra-tile levels.…”
Section: Related Work Utilizing Cache Block Sharingmentioning
confidence: 99%
See 1 more Smart Citation
“…When a thread takes a task, atomics are used to avoid race conditions. This work was extended in [Shrestha et al 2015]. They combined their introduced jagged tiling approach with the diamond tiling extension of the PLUTO framework to allow concurrent start at the inter-and intra-tile levels.…”
Section: Related Work Utilizing Cache Block Sharingmentioning
confidence: 99%
“…In contrast, our MWD approach allows the thread group to share one large diamond tile, providing more in-cache data reuse. Figure 5 of their paper [Shrestha et al 2015] shows an example of their two-level tiling. The diamond tile is split into nine sub-tile updates for fine-grained parallelization.…”
Section: Related Work Utilizing Cache Block Sharingmentioning
confidence: 99%
“…Bandishti et al [4] and Bondhugula et al [6] proposed a general formalism for diamond tiling in the polyhedral model by introducing a rescheduling step in the Pluto compiler. There has been a great amount of work [11,13,22,25,33,34] reported on the evaluation of diamond tiling. It was also generalized to handle iterated stencils defined over periodic data domains with index set splitting [5] and the Lattice-Boltzmann method [26].…”
Section: Related Workmentioning
confidence: 99%
“…On the other hand, cache block sharing technologies (introduced by Wellein et al [21]), achieve better performance by utilizing the shared hardware caches of modern CPUs. Recently, Shrestha et al [30] introduced cache block sharing techniques within PLUTO framework to perform source-to-source transformation of the stencil codes. To the extent of our knowledge, all proposed cache block sharing temporal blocking techniques compromise tile size for intra-tile concurrency, which we show to be sub-optimal in this work.…”
Section: Related Workmentioning
confidence: 99%