Pursuing the performance potential of dynamic cache line sizes

Vleet, P. Van; Anderson, Eric; Brown, Lisa; Baer, Jean-Loup; Karlin, Anna R.

doi:10.1109/iccd.1999.808592

Cited by 19 publications

(15 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Vleet et al [24] proposed using off-line profiling to determine the fetch size upon a cache miss. However, the lack of dynamism render these static approaches less effective when faced with a data set that changes rapidly during program execution.…”

Section: Prior Workmentioning

confidence: 99%

See 1 more Smart Citation

Accurate and Complexity-Effective Spatial Pattern Prediction

Chen

Yang²,

Falsafi

et al.

10th International Symposium on High Performance Computer Architecture (HPCA'04)

View full text Add to dashboard Cite

show abstract

Section: Prior Workmentioning

confidence: 99%

“…Recent research [13,17,18,23,24] indicates that there are large spatial variations in cache line usage both within and across programs. In the presence of such drastic variations, a fixed cache line size results in a sub-optimal design point.…”

Section: Introductionmentioning

confidence: 99%

Accurate and Complexity-Effective Spatial Pattern Prediction

Chen

Yang²,

Falsafi

et al.

10th International Symposium on High Performance Computer Architecture (HPCA'04)

View full text Add to dashboard Cite

show abstract

“…This expectation may be based on profile information [9,25], hardware detection of strided accesses [17] or spatial locality [ 12,14,25], or compiler annotation of load instructions [23]. Optimal off-line algorithms for fetching a set of noncontiguous words [24] or a variable-sized aligned block [25] on each miss provide bounds on these techniques. Pollution may also be reduced by prefetching into separate buffers [13,23].…”

Section: Related Workmentioning

confidence: 99%

“…Przybylski [ 181 analyzed cancelling an ongoing demand fetch (after the critical word had returned) on a subsequent miss, but found that performance was reduced, probably because the original block was not written into the cache. Our scheduling technique is independent of the scheme used to generate prefetch addresses; determining the combined benefit of scheduling and more conservative prefetching techniques [9,12,14,17,25] is an area of future research. Our results also show that in a large secondary cache, controlling the replacement priority of prefetched data appears sufficient to limit the displacement of useful referenced data.…”

Section: Related Workmentioning

confidence: 99%

Reducing DRAM latencies with an integrated memory hierarchy design

Lin

Reinhardt

Burger

Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture

View full text Add to dashboard Cite

In this papel; we address the severe performance gap caused by high processor clock rates and slow DRAM accesses. We show that even with an aggressive, next-generation memory system using four Direct Rambus channels and an integrated one-megabyte level-two cache, a processor still spends over half of its time stalling for L2 misses. Large cache blocks can improve performance, but only when coupled with wide memory channels. DRAM address mappings also affect performance significantly.We evaluate an aggressive prefetch unit integrated with the L2 cache and memory controllers. By issuing prefetches only when the Rambus channels are idle, prioritizing them to maximize DRAM row bufSer hits, and giving them low replacement priority, we achieve a 43% speedup across I O of the 26 SPEC2000 benchmarks, without degrading performance on the others. With eight Rambus channels, these ten benchmarks improve to within 10% of the peflormance of a perfect L2 cache.

show abstract

“…Vleet and co-authors [28] propose offline profiling to select fetch size upon a miss. Guided region prefetching [29] uses compiler hints to direct both spatial and non-spatial (e.g., pointer-chasing) prefetches.…”

Section: Related Workmentioning

confidence: 99%