Proceedings of the 9th International Conference on Supercomputing - ICS '95 1995
DOI: 10.1145/224538.224569
|View full text |Cite
|
Sign up to set email alerts
|

Data forwarding in scalable shared-memory multiprocessors

Abstract: Scalable shared-memory multiprocessors are often slowed down by long-latency memory accesses. One way to cope with this problem is to use data forwarding to overlap memory accesses with computation.With data forwarding, when a processor produces a datum, in addition to updating its cache, it sends a copy of the datum to the caches of the processors that the compiler identified as consumers of it. As a result, when the consumer processors access the datum, they find it in their caches.This paper addresses two m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
12
0

Year Published

1996
1996
2017
2017

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 41 publications
(13 citation statements)
references
References 18 publications
(14 reference statements)
0
12
0
Order By: Relevance
“…A smart buffer compiler exploits the fact that the input data on consecutive calls to a given co-processor frequently share items with previous calls; these items do not need to be copied. Similar techniques for propagating values in shared memory multiprocessors, such as data forwarding [15], can be used. CUBA allows data to be hosted by the co-processor local storage and uses a hybrid writethrough/write-back L2 cache policy.…”
Section: Data Transfermentioning
confidence: 99%
“…A smart buffer compiler exploits the fact that the input data on consecutive calls to a given co-processor frequently share items with previous calls; these items do not need to be copied. Similar techniques for propagating values in shared memory multiprocessors, such as data forwarding [15], can be used. CUBA allows data to be hosted by the co-processor local storage and uses a hybrid writethrough/write-back L2 cache policy.…”
Section: Data Transfermentioning
confidence: 99%
“…The decoupling of correctness and performance provides an opportunity to reduce the number of cache misses by predictively pushing data between system components. This predictive transfer of data can be triggered by a coherence protocol predictor [1,21,35], by software (e.g., the KSR1's "poststore" [37] and DASH's "deliver" [24]), or by allowing the memory to push data into processor caches. Since Token Coherence allows data and tokens to be transferred between system components without affecting correctness, these schemes are easily implemented correctly as part of a performance protocol.…”
Section: Other Performance Protocol Opportunitiesmentioning
confidence: 99%
“…In distributed DDM applications, remote memory accesses are introduced, resulting from producer and consumer DThreads running on different nodes. The distributed FREDDO implementation provides implicit data forwarding [36] to the node where the consumer DThread is scheduled to run. In particular, a consumer DThread can be scheduled for execution only when all of its input data are available in the main memory.…”
mentioning
confidence: 99%
“…In particular, a consumer DThread can be scheduled for execution only when all of its input data are available in the main memory. This helps to reduce memory latencies [36]. FREDDO is publicly available for download in [42].Distributed FREDDO provides implicit data forwarding through a distributed shared memory (DSM) implementation [54] with a shared global address space (GAS).…”
mentioning
confidence: 99%
See 1 more Smart Citation