Proceedings of the 1999 ACM/IEEE Conference on Supercomputing 1999
DOI: 10.1145/331532.331534
|View full text |Cite
|
Sign up to set email alerts
|

Locality optimizations for multi-level caches

Abstract: Compiler transformations can significantly improve data locality of scientific programs. In this paper, we examine the impact of multi-level caches on data locality optimizations. We find nearly all the benefits can be achieved by simply targeting the L1 (primary) cache. Most locality transformations are unaffected because they improve reuse for all levels of the cache; however, some optimizations can be enhanced. Inter-variable padding can take advantage of modular arithmetic to eliminate conflict misses and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
28
0

Year Published

2003
2003
2012
2012

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 32 publications
(28 citation statements)
references
References 37 publications
(52 reference statements)
0
28
0
Order By: Relevance
“…We have considered a different architecture, and our conclusions are different. Rivera and Tseng examined loop transformations for multilevel caches, and finding that all performance gains can be achieved by simply focusing on L1 cache [32]. Clearly, as we have considered architectures with a different type of multi-level memory hierarchy, our conclusions are different.…”
Section: Related Workmentioning
confidence: 91%
See 1 more Smart Citation
“…We have considered a different architecture, and our conclusions are different. Rivera and Tseng examined loop transformations for multilevel caches, and finding that all performance gains can be achieved by simply focusing on L1 cache [32]. Clearly, as we have considered architectures with a different type of multi-level memory hierarchy, our conclusions are different.…”
Section: Related Workmentioning
confidence: 91%
“…Similarly, there has been some work on optimizing data movements from main memory to device memory [35,34,22]. As multi-level processor caches became very common in mid-nineties, several compiler efforts considered optimizations for them [25,32,30].…”
Section: Introductionmentioning
confidence: 99%
“…Chame and Moon [8] developed techniques to minimize the sum of the capacity and cross-interference misses while avoiding self-interference misses. Rivera and Tseng [26] developed padding techniques to reduce interference misses and studied the effect of multi-level caches on data locality optimizations. Hsu and Kremer [16] presented a comprehensive comparative study of tile size selection algorithms.…”
Section: Related Workmentioning
confidence: 99%
“…In analytical approaches, a compiler selects tile sizes based on static analysis of loop nests and known characteristics of the memory hierarchy. Although several analytical techniques for tile size selection have been proposed in the literature [8,10,13,16,19,26,27,28], none has been demonstrated to be sufficiently effective for use in practice. As a result, the gap between the performance delivered by the best known tile sizes and those selected by an analytical approach has continued to widen, thereby diminishing the utility of past analytical approaches.…”
Section: Introductionmentioning
confidence: 99%
“…In order to quantify the benefits of adopting nonlinear layouts to reduce cache misses, there exist several different approaches. In [18], Rivera et al considers all levels of memory hierarchy to reduce L2 cache misses as well, rather than reducing only L1 ones. He presents even fewer overall misses, however performance improvements are rarely significant.…”
Section: Related Workmentioning
confidence: 99%