Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles 2013
DOI: 10.1145/2517349.2522714
|View full text |Cite
|
Sign up to set email alerts
|

Everything you always wanted to know about synchronization but were afraid to ask

Abstract: This paper presents the most exhaustive study of synchronization to date. We span multiple layers, from hardware cache-coherence protocols up to high-level concurrent software. We do so on different types of architectures, from single-socket -uniform and nonuniform -to multi-socket -directory and broadcastbased -many-cores. We draw a set of observations that, roughly speaking, imply that scalability of synchronization is mainly a property of the hardware.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

9
126
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
6
2
1

Relationship

2
7

Authors

Journals

citations
Cited by 197 publications
(137 citation statements)
references
References 28 publications
9
126
0
Order By: Relevance
“…Even our optimized NO WAIT implementation does not scale as well as SILO due to contention caused by atomic instructions used in the read-write lock implementation (Figure 2). Designing scalable, NUMA-aware read-write lock is a topic of intense research in the concurrent programming community [6,10,16]. Using such locks to further minimize the impact of physical synchronization in both pessimistic and optimistic protocols is a promising direction of future research.…”
Section: Implications Of Our Analysismentioning
confidence: 99%
“…Even our optimized NO WAIT implementation does not scale as well as SILO due to contention caused by atomic instructions used in the read-write lock implementation (Figure 2). Designing scalable, NUMA-aware read-write lock is a topic of intense research in the concurrent programming community [6,10,16]. Using such locks to further minimize the impact of physical synchronization in both pessimistic and optimistic protocols is a promising direction of future research.…”
Section: Implications Of Our Analysismentioning
confidence: 99%
“…For each data point, the two threads execute in lock step as shown in Figure 5 (similar measurements have been used in existing systems research [18,30,40,73]). Thread y brings the data in a modified state in its local caches and then thread x measures the latency of its own access to the shared data using the timestamp counter of the core [4].…”
Section: Context-to-context Latenciesmentioning
confidence: 99%
“…This tendency makes the task of developers very challenging, for they need to fine-tune software to the underlying hardware in order to achieve performance (e.g., [12,18,19,30,37]). Furthermore, optimizing for specific multi-core topologies hinders software portability.…”
Section: Introductionmentioning
confidence: 99%
“…Unlike ms-lb with MCS locks, all other designs have two single points-cache lines-of contention, namely the head and the tail of the queue. Opteron is an 8-socket machine, thus increasing the number of threads, increases the non-uniformity as well, resulting in more expensive cache-coherence traffic [7]. Still, on both platforms, ms-lb is slower than the rest on less than 6-7 threads.…”
Section: Optik In Queuesmentioning
confidence: 99%