9th Annual Workshop on Interaction Between Compilers and Computer Architectures (INTERACT'05)
DOI: 10.1109/interact.2005.1
|View full text |Cite
|
Sign up to set email alerts
|

A Tile Size Selection Analysis for Blocked Array Layouts

Abstract: Efficient use of the memory hierarchy is essential for good performance due to the ever increasing gap between processor and memory speed. Program transformations such as loop tiling have been shown to be an effective approach to improving locality and cache exploitation, especially for dense matrix scientific computations. In conjunction with tiling, several experimental studies have been conducted on blocked data layouts, as a data transformation technique used to boost the cache performance. The stability o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 28 publications
0
3
0
Order By: Relevance
“…Taking into account the miss penalty of each memory level, as well as the penalty of mispredicted branches (as presented in [2]), we derive the total miss cost of Table 2. D-TLB misses requirements MTLB Figure 4 makes clear that L1 misses dominate cache and, as a result, total performance in the Xeon DP architecture.…”
Section: Total Miss Costmentioning
confidence: 99%
“…Taking into account the miss penalty of each memory level, as well as the penalty of mispredicted branches (as presented in [2]), we derive the total miss cost of Table 2. D-TLB misses requirements MTLB Figure 4 makes clear that L1 misses dominate cache and, as a result, total performance in the Xeon DP architecture.…”
Section: Total Miss Costmentioning
confidence: 99%
“…As we will comment below, our results agree with this: our iterative tiled algorithm working on SB outperforms the recursive code operating on hypermatrices. Authors have also investigated on tile size selection for non-canonical array layouts [28,22,29] and have come to similar conclusions to the case of canonical storage: blocks should target the level 1 cache.…”
Section: Serial Dense Codes Using Non-canonical Array Layoutsmentioning
confidence: 84%
“…As we will comment below, our results agree with this: our iterative tiled algorithm working on BDL outperforms the recursive code operating on hypermatrices. Authors have also investigated on tile size selection for nonlinear array layouts [165,145,15] and have come to similar conclusions to the case of canonical storage: blocks should target the level 1 cache.…”
Section: Serial Dense Codes Using Nonlinear Array Layoutsmentioning
confidence: 84%