Proceedings of the 12th International Conference on Supercomputing 1998
DOI: 10.1145/277830.277917
|View full text |Cite
|
Sign up to set email alerts
|

Eliminating conflict misses for high performance architectures

Abstract: Many cache misses in scientific programs are due to conflicts caused by limited set associativity. Two data-layout transformations, inter-and intra-variable padding, can eliminate many confict misses at compile time. We present GROUP-PAD, an inter-variable padding heuristic to preserve group reuse in stencil computations frequently found in scientific computations.We show padding can also improve performance in parallel programs.Our optimizations have been implemented and tested on a collection of kernels and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
32
0

Year Published

2000
2000
2005
2005

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 40 publications
(32 citation statements)
references
References 22 publications
0
32
0
Order By: Relevance
“…Instead, most optimizations have focused on exploiting temporal and spatial reuse within individual loop nests [21,33]. Tiling is usually not needed, since most locality can be obtained through loop permutation, though in some cases array padding may be necessary to preserve group reuse [25].…”
Section: Tiling For Stencil Codesmentioning
confidence: 99%
“…Instead, most optimizations have focused on exploiting temporal and spatial reuse within individual loop nests [21,33]. Tiling is usually not needed, since most locality can be obtained through loop permutation, though in some cases array padding may be necessary to preserve group reuse [25].…”
Section: Tiling For Stencil Codesmentioning
confidence: 99%
“…Instability comes from the so-called pathological array sizes, when array dimensions are near powers of two, since cache interference is a particular risk at that point. Array padding [8], [13], [16] is a compiler optimization that increases the array sizes and changes initial locations to avoid pathological cases. It introduces space overhead but effectively stabilizes program performance.…”
Section: Related Workmentioning
confidence: 99%
“…Although the results are accurate, the time needed to obtain them is typically many times greater than the total execution time of the program being simulated. To try to overcome such problems, analytical models of cache behaviour combined with heuristics have also been developed, to guide optimizing compilers [6], [16] and [23], or study the cache performance of particular types of algorithm, especially blocked ones [3], [7], [10], and [22]. Code optimizations, such as tile size selection, selected with the help of predicted miss ratios require a really accurate assessment of program's code behaviour.…”
Section: Related Workmentioning
confidence: 99%
“…Wolf et al [34] consider the integrated treatment of fusion and tiling only from the point of view of enhancing locality and do not consider the impact of the amount of required memory; the memory requirement is a key issue for the problems considered in this paper. Loop tiling for enhancing data locality has been studied extensively [27,33,30], and analytic models of the impact of tiling on locality have been developed [7,20,25]. Recently, a data-centric version of tiling called data shackling has been developed [12,13] (together with more recent work by Ahmed et al [1]) which allows a cleaner treatment of locality enhancement in imperfectly nested loops.…”
Section: Related Workmentioning
confidence: 99%