2002
DOI: 10.1145/545214.545235
|View full text |Cite
|
Sign up to set email alerts
|

Abstract: This paper introduces the idea of using a User-Level Memory Thread (ULMT) for correlation prefetching. In this approach, a user thread runs on a general-purpose processor in main memory, either in the memory controller chip or in a DRAM chip. The thread performs correlation prefetching in software, sending the prefetched data into the L2 cache of the main processor. This approach requires minireal hardware beyond the memory processor: the correlation table is a software data structure that resides in main memo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
107
0

Year Published

2004
2004
2018
2018

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 73 publications
(107 citation statements)
references
References 29 publications
0
107
0
Order By: Relevance
“…Depth prefetching lets the prefetcher run farther ahead of the actual address stream. 6 There are also hybrid methods that use a combination of width and depth. In Figure 2, if w is 2 and d is 2, blocks C and D will be prefetched, followed by B and C (although the second prefetch to C is redundant and will be filtered out).…”
Section: Generalized Correlation Prefetchingmentioning
confidence: 99%
See 1 more Smart Citation
“…Depth prefetching lets the prefetcher run farther ahead of the actual address stream. 6 There are also hybrid methods that use a combination of width and depth. In Figure 2, if w is 2 and d is 2, blocks C and D will be prefetched, followed by B and C (although the second prefetch to C is redundant and will be filtered out).…”
Section: Generalized Correlation Prefetchingmentioning
confidence: 99%
“…[3][4][5][6] Figure 1a illustrates conventional We propose an alternative structure, shown in Figure 1b, for holding prefetch history. In this structure, a fixed-length FIFO table, the global history buffer (GHB) holds cache miss addresses.…”
mentioning
confidence: 99%
“…Hardware and software prefetching techniques have been studied extensively [10,33,11,25,24,31,4]. Hardware-controlled prefetchers are highly effective for applications with regular data access patterns [4]; they have been integrated into all modern high-performance processors, including Intel Core i3/i5/i7, AMD Opteron and IBM POWER, and many embedded and mobile processors, such as ARM's Cortex-A9 and Cortex-A15.…”
Section: Related Workmentioning
confidence: 99%
“…Another proposed class of prefetchers utilizes address correlation [3,4,10,11,15,20], which promises wider applicability across a diverse spectrum of workloads because they target generalized memory access patterns. Rather than detecting patterns in data layout, these prefetchers correlate data addresses to predict future misses.…”
Section: Introductionmentioning
confidence: 99%
“…However, capacity constraints of DBCP's on-chip correlation table limit its coverage. Conversely, the designs of Solihin et al [20] and Wenisch et al [24] record address correlation data in off-chip DRAM. Although these mechanisms have abundant correlation data storage, lookup is performed off chip, drastically increasing prediction latency.…”
Section: Introductionmentioning
confidence: 99%