x86-TSO

Sewell, Peter; Sarkar, Susmit; Owens, Scott; Nardelli, Francesco Zappa; Myreen, Magnus O.

doi:10.1145/1785414.1785443

Cited by 345 publications

(53 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We implemented all of the lock algorithms and benchmarks in C and C++ compiled with GCC 4.7.1 at optimization level -O3 in 32-bit mode. As required, we inserted memory fences to support the memory model on x86 [Sewell et al 2010] and SPARC where a store and load in program order can be reordered by the architecture. While not shown in our pseudo-code, padding and alignment were added in the data structures to avoid false sharing.…”

Section: Empirical Evaluationmentioning

confidence: 99%

Lock Cohorting

Dice

Marathe

Shavit

2015

ACM Trans. Parallel Comput.

View full text Add to dashboard Cite

Multicore machines are quickly shifting to NUMA and CC-NUMA architectures, making scalable NUMAaware locking algorithms, ones that take into account the machine's nonuniform memory and caching hierarchy, ever more important. This article presents lock cohorting, a general new technique for designing NUMA-aware locks that is as simple as it is powerful.Lock cohorting allows one to transform any spin-lock algorithm, with minimal nonintrusive changes, into a scalable NUMA-aware spin-lock. Our new cohorting technique allows us to easily create NUMA-aware versions of the TATAS-Backoff, CLH, MCS, and ticket locks, to name a few. Moreover, it allows us to derive a CLH-based cohort abortable lock, the first NUMA-aware queue lock to support abortability.We empirically compared the performance of cohort locks with prior NUMA-aware and classic NUMAoblivious locks on a synthetic micro-benchmark, a real world key-value store application memcached, as well as the libc memory allocator. Our results demonstrate that cohort locks perform as well or better than known locks when the load is low and significantly out-perform them as the load increases.

show abstract

Section: Empirical Evaluationmentioning

confidence: 99%

Lock Cohorting

Dice

Marathe

Shavit

2015

ACM Trans. Parallel Comput.

View full text Add to dashboard Cite

show abstract

“…The total-store-ordering semantics [20] provides that all instructions are executed in program order (or, more precisely, cannot be observed to be executed out of program order), each write is visible either globally or only to its own thread, and writes become globally visible in program order. Consequently, visibility-and execution-order edges compile away to nothing, and pushes can be implemented by mfences.…”

Section: Methodsmentioning

confidence: 99%

“…We take as our reference points the (broadly similar) Power and ARM architectures, and the x86 architecture, because they enjoy rigorous, usable specifications [19,5,20]. We focus on the former because in all cases relevant to this paper, the complexities of Power and ARM subsume those of x86.…”

Section: The Rmc Memory Modelmentioning

confidence: 99%

“…Like Sarkar, et al's model for Power [19], and Sewell, et al's model for x86 [20], our calculus is based on an operational semantics, not an axiomatic semantics. Nevertheless, one aspect of our system does give it some axiomatic flavor.…”

Section: The Rmc Memory Modelmentioning

confidence: 99%

“…Recent work has addressed this difficulty for some important architectures [20,19,4,3,5], but for portable, imperative programming languages the memory models are either insufficiently expressive or quite complex:…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A Calculus for Relaxed Memory

Crary

Sullivan

2015

Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages

View full text Add to dashboard Cite

We propose a new approach to programming multi-core, relaxed-memory architectures in imperative, portable programming languages. Our memory model is based on explicit, programmer-specified requirements for order of execution and the visibility of writes. The compiler then realizes those requirements in the most efficient manner it can. This is in contrast to existing memory models, which-if they allow programmer control over synchronization at all-are based on inferring the execution and visibility consequences of synchronization operations or annotations in the code.We formalize our memory model in a core calculus called RMC. Outside of the programmer's specified requirements, RMC is designed to be strictly more relaxed than existing architectures. It employs an aggressively nondeterministic semantics for expressions, in which actions can be executed in nearly any order, and a store semantics that generalizes Sarkar, et al.'s and Alglave, et al.'s models of the Power architecture. We establish several results for RMC, including sequential consistency for two programming disciplines, and an appropriate notion of type safety. All our results are formalized in Coq.

show abstract