Shared-Memory Synchronization

Scott, Michael L.

doi:10.2200/s00499ed1v01y201304cac023

Cited by 42 publications

(53 citation statements)

References 221 publications

(217 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The growing size of multicore machines is likely to shift the design space in the NUMA and CC-NUMA direction, requiring a significant rehash of existing concurrent algorithms and synchronization mechanisms [David et al 2013;Eyerman and Eeckhout 2010;Johnson et al 2010;Scott 2013]. This article tackles the most basic of the multicore synchronization algorithms, the lock, presenting a simple new lock design approach-lock cohorting-fit for NUMA machines.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Lock Cohorting

Dice

Marathe

Shavit

2015

ACM Trans. Parallel Comput.

View full text Add to dashboard Cite

Multicore machines are quickly shifting to NUMA and CC-NUMA architectures, making scalable NUMAaware locking algorithms, ones that take into account the machine's nonuniform memory and caching hierarchy, ever more important. This article presents lock cohorting, a general new technique for designing NUMA-aware locks that is as simple as it is powerful.Lock cohorting allows one to transform any spin-lock algorithm, with minimal nonintrusive changes, into a scalable NUMA-aware spin-lock. Our new cohorting technique allows us to easily create NUMA-aware versions of the TATAS-Backoff, CLH, MCS, and ticket locks, to name a few. Moreover, it allows us to derive a CLH-based cohort abortable lock, the first NUMA-aware queue lock to support abortability.We empirically compared the performance of cohort locks with prior NUMA-aware and classic NUMAoblivious locks on a synthetic micro-benchmark, a real world key-value store application memcached, as well as the libc memory allocator. Our results demonstrate that cohort locks perform as well or better than known locks when the load is low and significantly out-perform them as the load increases.

show abstract

Section: Resultsmentioning

confidence: 99%

“…As such, they are NUMA-oblivious. Scott [2013] notes that MCS may be preferable to CLH in some NUMA environments as the queue "node" structures on which threads busy-wait will migrate between threads under CLH but do not circulate under MCS.…”

Section: Empirical Evaluationmentioning

confidence: 99%

Lock Cohorting

Dice

Marathe

Shavit

2015

ACM Trans. Parallel Comput.

View full text Add to dashboard Cite

show abstract

“…8 Because of the overhead of maintaining the lock ownership, implementers can decide against reentrant locks, which also typically use a counter field. 15 The rationale behind optimistic spinning is if the thread that owns the lock is running, then it is likely to release the lock soon. In practice, a Linux kernel mutex or rw_semaphore (reader-writer semaphore), the two most commonly used locks throughout the system, can follow up to three possible paths when acquiring the lock, depending on its current state: 12 ˲ Fastpath.…”

Section: Contributing Factors To Poor Lock Scalingmentioning

confidence: 99%

“…Thus, if the average time a thread expects to wait is less than twice the context-switch time, then spinning will actually be faster than blocking. 15 The quality-of-service guarantee is another factor to consider when choosing between spinning and sleeping locks, particularly in realtime systems. Blocking on larger NUMA systems can ultimately starve the system of resources.…”

Section: Contributing Factors To Poor Lock Scalingmentioning

confidence: 99%

Scalability techniques for practical synchronization primitives

Bueso¹

2014

Commun. ACM

View full text Add to dashboard Cite

IN AN IDEAL WORLD, applications are expected to scale automatically when executed on increasingly larger systems. In practice, however, not only does this scaling not occur, but also it is common to see performance actually worsen on those larger-scale systems.While performance and scalability can be ambiguous terms, they becomes less so when problems present themselves at the lower end of the software stack. This is simply because the number of factors to consider when evaluating a performance problem decreases. As such, concurrent multithreaded programs such as operating-system kernels, hypervisors, and database engines can pay a high price when misusing hardware resources.This translates into performance issues for applications executing higher up in the stack. One clear example is the design and implementation of synchronization primitives (locks) for shared memory systems. Locks are a way of allowing multiple threads to execute concurrently, providing safe and correct execution context through mutual exclusion. To achieve serialization, locks typically require hardware support through the use of atomic operations such as compareand-swap (CAS), fetch-and-add, and arithmetic instructions. While details vary across different cache-coherent architectures, atomic operations will broadcast changes across the memory bus, updating the value of the shared variable for every core, forcing cacheline invalidations and, therefore, more cache-line misses. Software engineers often abuse these primitives, leading to significant performance degradation caused by poor lock granularity or high latency.Both the correctness and the performance of locks depend on the underlying hardware architecture. That is why scalability and the hardware implications are so important in the design of locking algorithms. Unfortunately, these are rare considerations in realworld software.With the advent of increasingly larger multi-and many-core NUMA (nonuniform memory access) systems, the performance penalties of poor locking implementations become painfully evident. These penalties apply to the actual primitive's implementation, as well as its usage, the latter of which many developers directly control by designing locking schemes for data serialization. After decades of research, this is a well-known fact and has never been truer than today. Despite recent technologies such as lock elision and transactional memory, however, concurrency, parallel programming, and synchronization are still challenging topics for practitioners. 10 Furthermore, because a transactional memory system such as Trans-

show abstract

“…For shared variables, two memory operations are conflicting if both operations access the same variable, while one of them is a write access. Conflicting synchronization operations can be defined in a similar way [24]. [25] presented a hierarchy of interleaving coverage criteria for concurrent programs, based on different concurrency fault models, but did not consider input-directed coverage criteria in this context.…”

Section: Related Workmentioning

confidence: 99%

Input-Driven Active Testing of Multi-threaded Programs

Han

Chen

et al. 2015

2015 Asia-Pacific Software Engineering Conference (APSEC)

View full text Add to dashboard Cite

It is still a challenge to select "good" test inputs for concurrent programs within limited testing resources. We present in this paper a test case diversity metric for multi-threaded programs, which evaluates a test input with its effect in exposing concurrent thread interactions. We then propose an input-driven active testing approach with two test input selection strategies based on our test case diversity metric. We implement our testing approach based on Maple, an interleaving coverage-driven active testing tool. The effectiveness and efficiency of our testing approach are compared closely with Maple, which on its own is supplied with random test inputs. Experimental results show that our testing approach can outperform the original active testing approach in the number of test inputs executed and the time usage for fulfilling the interleaving coverage criterion of Maple. The selected test inputs based on our test case diversity metric are very cost-effective in exposing concurrent thread interactions and hence can help detect concurrency bugs with less cost and effort.

show abstract

Shared-Memory Synchronization

Cited by 42 publications

References 221 publications

Lock Cohorting

Lock Cohorting

Scalability techniques for practical synchronization primitives

Input-Driven Active Testing of Multi-threaded Programs

Contact Info

Product

Resources

About