Streamlining data cache access with fast address calculation

Austin, Todd; Pnevmatikatos, Dionisios; Sohi, Gurindar S.

doi:10.1145/223982.224447

Cited by 41 publications

(37 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Of course, since prefetching has no architected side effects, no mechanism is needed for verifying the accuracy of the prediction or for recovering from mispredictions. Another example of a technique that speculates on data address is fast address calculation [26,11], which enables early initiation of memory loads by speculatively generating addresses early in the pipeline.…”

Section: Data Speculationmentioning

confidence: 99%

Exceeding the dataflow limit via value prediction

Lipasti

Shen

Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29

260

186

View full text Add to dashboard Cite

Meanwhile, numerous mechanisms have been proposed and implemented to eliminate false data dependences and tolerate the latencies induced by true data dependences by allowing instructions to execute out of program order (see [8] for an overview).Surprisingly, in light of the extensive energies focused on eliminating control-flow restrictions on parallel instruction issue, less attention has been paid to eliminating data-flow restrictions on parallel issue. Recent work has focused primarily on reducing the latency of specific types of instructions (usually loads from memory) by rearranging pipeline stages [9,10], initiating memory accesses earlier [11], or speculating that dependences to earlier stores do not exist [12,13,14,15].The most relevant prior work in the area of eliminating data-flow dependences consists of the Tree Machine [16,17], which uses a value cache to store and look up the results of recurring arithmetic expressions to eliminate redundant computation (the value cache, in effect, performs common subexpression elimination [1] in hardware). Richardson follows up on this concept in [18] by introducing the concepts of trivial computation, which is defined as the trivialization of potentially complex operations by the occurrence of simple operands; and redundant computation, where an operation repeatedly performs the same computation because it sees the same operands. He proposes a hardware mechanism (the result cache) which reduces the latency of such trivial or redundant complex arithmetic operations by storing and looking up their results in the result cache.

show abstract

Section: Data Speculationmentioning

confidence: 99%

Exceeding the dataflow limit via value prediction

Lipasti

Shen

Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29

260

186

View full text Add to dashboard Cite

show abstract

“…The equivalent of way-prediction for icaches is often combined with branch prediction [5,9], but because D-caches do not interact with branch prediction, those techniques cannot be used directly. An alternative to prediction is to obtain the correct way-number of the displaced block using the address, which delays initiating cache access to the displaced block, as is the case for statically probed schemes such as column-associative and We examine two handles that can be used to perform way prediction: instruction PC and approximate data address formed by XORing the register value with the instruction offset (proposed in [3], and used in [6]), which may be faster than performing a full add. These two handles represent the two extremes of the trade-off between prediction accuracy and early availability in the pipeline, as shown in Figure 3.…”

Section: Way Predictionmentioning

confidence: 99%

“…XOR-based way prediction, used in the PSA paper [6}, relies on the idea that while a pipeline stage computes the data address by adding the source register value to the instruction offset, the register value can be XORed with the instruction offset to compute an approximate of the address [3] and access a way-prediction table. This scheme exploits the fact that most memory instructions have small enough offsets so that the block address from the XOR approximation is usually same as or at least correlates well with the block address from the actual data address.…”

Section: Xor-based Way-predictionmentioning

confidence: 99%

“…This stipulation rules out using the data address for prediction lookup. PSA recommends an XOR-based way-prediction, which XORs the instruction offset with the source register value to approximate the address [3] and looks up the prediction table. XOR operation on a register value often obtained late from a register-forwarding path followed by a table lookup, is likely to be slower than a full add to compute the address, delaying access initiation.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Reactive-associative caches

Batson¹,

Vijaykumar²

Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques

View full text Add to dashboard Cite

show abstract

“…They have been mainly used to reduce the influence of conditional branches [11]. In recent works, prediction techniques have also been applied to predict values or addresses to speculatively issue dependent operations [2] [7][9] [10].…”

Section: Introductionmentioning

confidence: 99%