LimitLESS Directories: A Scalable Cache Coherence Scheme

Chaiken, David; Kubiatowicz, John; Agarwal, Anant

doi:10.21236/ada237629

Cited by 64 publications

(91 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…DSMs typically employ cache-coherence protocols that are directory based 15,32,3,14,63 . Directorybased cache-coherence protocols maintain a directory entry for each cache block of data 26 CHAPTER 2.…”

Section: Cache Coherencementioning

confidence: 99%

“…Initially, before time t0, the processor is executing non-MGS code, indicated by a bar lled with a hash pattern. This code executes in context C0 Once running, the rst thing the handler does is turn on interrupts 15 . Then, it performs some work which is timed by some statistics instrumentation code.…”

Section: Statisticsmentioning

confidence: 99%

“…Fortunately, Alewife provides four cycle counters in hardware; therefore, we dedicate one to each of the four hardware contexts in the processor. We modify the Alewife kernel to maintain the in- 15 In Alewife, handlers begin running with interrupts o ; this allows them to perform atomic operations. However, long running handlers must re-enable interrupts otherwise other incoming handlers are blocked until the running handler nishes.…”

Section: Implementation Issues On Alewifementioning

confidence: 99%

See 2 more Smart Citations

Multigrain shared memory

Yeung

Kubiatowicz

Agarwal

2000

ACM Trans. Comput. Syst.

Self Cite

View full text Add to dashboard Cite

Parallel workstations, each comprising a 10-100 processor shared memory machine, promise cost-e ective general-purpose multiprocessing. This thesis explores the coupling of such small-to medium-scale shared memory multiprocessors through software over a local area network to synthesize larger shared memory systems. Multiprocessors built in this fashion are called Distributed Scalable Shared memory Multiprocessors DSSMPs .The challenge of building DSSMPs lies in seamlessly extending hardware-supported shared memory of each parallel workstation to span a cluster of parallel workstations using software only. Such a shared memory system is called Multigrain Shared Memory because it naturally supports two grains of sharing: ne-grain cache-line sharing within each parallel workstation, and coarse-grain page sharing across parallel workstations. Applications that can leverage the e cient ne-grain support for shared memory provided by each parallel workstation have the potential for high performance.This thesis makes three contributions in the context of Multigrain Shared Memory. First, it provides the design of a multigrain shared memory system, called MGS, and demonstrates its feasibility and correctness via an implementation on a 32-processor Alewife machine. Second, this thesis undertakes an in-depth application study that quanti es the extent to which shared memory applications can leverage e cient shared memory mechanisms provided by DSSMPs. The thesis begins by looking at the performance of unmodi ed shared memory programs, and then investigates application transformations that improve performance. Finally, this thesis presents an approach called Synchronization Analysis for analyzing the performance of multigrain shared memory systems. The thesis develops a performance model based on Synchronization Analysis, and uses the model to study DSSMPs with up to 512 processors. The experiments and analysis demonstrate that scalable DSSMPs can beconstructed from small-scale workstation nodes to achieve competitive performance with large-scale all-hardware shared memory systems. For instance, the model predicts that a 256-processor DSSMP built from 16-processor parallel workstation nodes achieves equivalent performance to a 128-processor all-hardware multiprocessor on a communication-intensive w orkload.

show abstract

Section: Cache Coherencementioning

confidence: 99%

Section: Statisticsmentioning

confidence: 99%

Section: Implementation Issues On Alewifementioning

confidence: 99%

See 1 more Smart Citation

Multigrain shared memory

Yeung

Kubiatowicz

Agarwal

2000

ACM Trans. Comput. Syst.

Self Cite

View full text Add to dashboard Cite

show abstract

“…3.4 to simplify our implementation. Directory state: Conventional directories must track both the sharers of each line (using a bit-vector or other techniques [13,53,66]), and, if there is a single sharer, whether it has exclusive or read-only permission. In Coup, the directory must track whether sharers have exclusive, read-only, or update-only permission.…”

Section: Structural Changesmentioning

confidence: 99%

Exploiting commutativity to reduce the cost of updates to shared data in cache-coherent systems

Zhang

Horn

Sánchez

2015

Proceedings of the 48th International Symposium on Microarchitecture

View full text Add to dashboard Cite

We present Coup, a technique to lower the cost of updates to shared data in cache-coherent systems. Coup exploits the insight that many update operations, such as additions and bitwise logical operations, are commutative: they produce the same final result regardless of the order they are performed in. Coup allows multiple private caches to simultaneously hold update-only permission to the same cache line. Caches with updateonly permission can locally buffer and coalesce updates to the line, but cannot satisfy read requests. Upon a read request, Coup reduces the partial updates buffered in private caches to produce the final value. Coup integrates seamlessly into existing coherence protocols, requires inexpensive hardware, and does not affect the memory consistency model.We apply Coup to speed up single-word updates to shared data. On a simulated 128-core, 8-socket system, Coup accelerates state-of-the-art implementations of update-heavy algorithms by up to 2.4×.

show abstract

“…We will refer to these proposals as compressed sharing codes (also known as multicast protocols [16] and limited broadcast protocols [1]), as opposed to exact ones (as limited pointers [4] or…”

Section: ¢ ¡ ¤ £ ¥mentioning

confidence: 99%