Everything you always wanted to know about synchronization but were afraid to ask

David, Tudor; Guerraoui, Rachid; Trigonakis, Vasileios

doi:10.1145/2517349.2522714

Cited by 197 publications

(137 citation statements)

References 28 publications

Supporting

Mentioning

126

Contrasting

Order By: Relevance

“…Even our optimized NO WAIT implementation does not scale as well as SILO due to contention caused by atomic instructions used in the read-write lock implementation (Figure 2). Designing scalable, NUMA-aware read-write lock is a topic of intense research in the concurrent programming community [6,10,16]. Using such locks to further minimize the impact of physical synchronization in both pessimistic and optimistic protocols is a promising direction of future research.…”

Section: Implications Of Our Analysismentioning

confidence: 99%

Analyzing the impact of system architecture on the scalability of OLTP engines for high-contention workloads

et al. 2017

View full text Add to dashboard Cite

Main-memory OLTP engines are being increasingly deployed on multicore servers that provide abundant thread-level parallelism. However, recent research has shown that even the state-of-the-art OLTP engines are unable to exploit available parallelism for high contention workloads. While previous studies have shown the lack of scalability of all popular concurrency control protocols, they consider only one system architecture-a non-partitioned, shared everything one where transactions can be scheduled to run on any core and can access any data or metadata stored in shared memory.In this paper, we perform a thorough analysis of the impact of other architectural alternatives (Data-oriented transaction execution, Partitioned Serial Execution, and Delegation) on scalability under high contention scenarios. In doing so, we present Trireme, a main-memory OLTP engine testbed that implements four system architectures and several popular concurrency control protocols in a single code base. Using Trireme, we present an extensive experimental study to understand i) the impact of each system architecture on overall scalability, ii) the interaction between system architecture and concurrency control protocols, and iii) the pros and cons of new architectures that have been proposed recently to explicitly deal with high-contention workloads.

show abstract

Section: Implications Of Our Analysismentioning

confidence: 99%

Analyzing the impact of system architecture on the scalability of OLTP engines for high-contention workloads

et al. 2017

View full text Add to dashboard Cite

show abstract

“…For each data point, the two threads execute in lock step as shown in Figure 5 (similar measurements have been used in existing systems research [18,30,40,73]). Thread y brings the data in a modified state in its local caches and then thread x measures the latency of its own access to the shared data using the timestamp counter of the core [4].…”

Section: Context-to-context Latenciesmentioning

confidence: 99%

“…This tendency makes the task of developers very challenging, for they need to fine-tune software to the underlying hardware in order to achieve performance (e.g., [12,18,19,30,37]). Furthermore, optimizing for specific multi-core topologies hinders software portability.…”

Section: Introductionmentioning

confidence: 99%

Abstracting Multi-Core Topologies with MCTOP

Chatzopoulos

Guerraoui

Harris

et al. 2017

Proceedings of the Twelfth European Conference on Computer Systems

Self Cite

View full text Add to dashboard Cite

Portability and efficiency are usually antagonists in multicore computing. In order to develop efficient code, one needs to take into account the topology of the target multi-cores (e.g., for locality). This clearly hampers code portability. In this paper, we show that you can have the cake and eat it too.We introduce MCTOP, an abstraction of multi-core topologies augmented with important low-level hardware information, such as memory bandwidths and communication latencies. We show how to automatically generate MCTOP using libmctop, our library that leverages the determinism of cache-coherence protocols to infer the topology of multi-cores using only latency measurements.MCTOP enables developers to accurately and portably define high-level performance optimization policies. We illustrate several such policies through four examples: (i-ii) thread placement in OpenMP and in a MapReduce library, (iii) a topology-aware mergesort algorithm, as well as (iv) automatic backoff schemes for locks. We illustrate the portability of these optimizations on five processors from Intel, AMD, and Oracle, with low effort.

show abstract

“…Unlike ms-lb with MCS locks, all other designs have two single points-cache lines-of contention, namely the head and the tail of the queue. Opteron is an 8-socket machine, thus increasing the number of threads, increases the non-uniformity as well, resulting in more expensive cache-coherence traffic [7]. Still, on both platforms, ms-lb is slower than the rest on less than 6-7 threads.…”

Section: Optik In Queuesmentioning

confidence: 99%

Optimistic concurrency with OPTIK

GuerraouiRachid

TrigonakisVasileios

2016

SIGPLAN Not.

Self Cite

View full text Add to dashboard Cite

We introduce OPTIK, a new practical design pattern for designing and implementing fast and scalable concurrent data structures. OPTIK relies on the commonly-used technique of version numbers for detecting conflicting concurrent operations. We show how to implement the OPTIK pattern using the novel concept of OPTIK locks. These locks enable the use of version numbers for implementing very efficient optimistic concurrent data structures. Existing state-of-the-art lock-based data structures acquire the lock and then check for conflicts. In contrast, with OPTIK locks, we merge the lock acquisition with the detection of conflicting concurrency in a single atomic step, similarly to lock-free algorithms. We illustrate the power of our OPTIK pattern and its implementation by introducing four new algorithms and by optimizing four state-of-the-art algorithms for linked lists, skip lists, hash tables, and queues. Our results show that concurrent data structures built using OPTIK are more scalable than the state of the art.

show abstract

Everything you always wanted to know about synchronization but were afraid to ask

Cited by 197 publications

References 28 publications

Analyzing the impact of system architecture on the scalability of OLTP engines for high-contention workloads

Analyzing the impact of system architecture on the scalability of OLTP engines for high-contention workloads

Abstracting Multi-Core Topologies with MCTOP

Optimistic concurrency with OPTIK

Contact Info

Product

Resources

About