2012
DOI: 10.1145/2133382.2133384
|View full text |Cite
|
Sign up to set email alerts
|

When Prefetching Works, When It Doesn’t, and Why

Abstract: In emerging and future high-end processor systems, tolerating increasing cache miss latency and properly managing memory bandwidth will be critical to achieving high performance. Prefetching, in both hardware and software, is among our most important available techniques for doing so; yet, we claim that prefetching is perhaps also the least well-understood.Thus, the goal of this study is to develop a novel, foundational understanding of both the benefits and limitations of hardware and software prefetching. Ou… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
73
0

Year Published

2013
2013
2021
2021

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 109 publications
(73 citation statements)
references
References 43 publications
0
73
0
Order By: Relevance
“…For instance, stride prefetchers use the distance (i.e., stride of the load) between the current memory and last memory addresses referenced by a load instruction to fetch an address formed by the last address plus the stride distance. For a complete review on hardware prefetchers, please refer to [Lee et al 2012]. Usually, real-time application designers disable hardware prefetchers to improve predictability.…”
Section: Future Directionsmentioning
confidence: 99%
“…For instance, stride prefetchers use the distance (i.e., stride of the load) between the current memory and last memory addresses referenced by a load instruction to fetch an address formed by the last address plus the stride distance. For a complete review on hardware prefetchers, please refer to [Lee et al 2012]. Usually, real-time application designers disable hardware prefetchers to improve predictability.…”
Section: Future Directionsmentioning
confidence: 99%
“…Similarly, Laurenzano et al [10] proposed a runtime mechanism to find opportunities to insert non-temporal prefetch instructions in batch applications to conserve LLC space so that user-facing applications' performance in datacenters remains predictable. Lee et al [11] investigated combining hardware prefetching and software prefetching for singlethreaded applications, concluding that caution should be exercised when mixing the two. In contrast to their work we have shown that hardware prefetching can be combined with software prefetching in a useful way to increase throughput performance in multicores.…”
Section: Related Workmentioning
confidence: 99%
“…Data caches are a nice example where prefetching is automatically driven in hardware: a full set of data (a cache line) is loaded as a single data item contained in it is explicitly accessed. A very interesting recent study [32] tries to draw some ''guiding lines'' on the usage of prefetching to achieve an actual performance gain. However, no timing guarantees are provided, nor timing predictability is among the goals of the most advanced prefetching techniques.…”
Section: Related Workmentioning
confidence: 99%