Optimal Strategies for Spinning and Blocking

Boguslavsky, Leonid B.; Harzallah, K.; Kreinen, A.; Sevcik, Kenneth C.; Vainshtein, A.

doi:10.1006/jpdc.1994.1056

Cited by 21 publications

(21 citation statements)

References 10 publications

(2 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Every store on a shared or owned cache line incurs a broadcast invalidation to all nodes. This happens because the cache directory is incomplete (it does not keep track of the sharers) and does not in any way detect whether sharing is limited within the node 7 . Therefore, even if all sharers reside on the same node, a store needs to pay the overhead of a broadcast, thus increasing the cost from around 83 to 244 cycles.…”

Section: Remote Accessesmentioning

confidence: 99%

Everything you always wanted to know about synchronization but were afraid to ask

David

Guerraoui

Trigonakis

2013

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

197

126

View full text Add to dashboard Cite

This paper presents the most exhaustive study of synchronization to date. We span multiple layers, from hardware cache-coherence protocols up to high-level concurrent software. We do so on different types of architectures, from single-socket -uniform and nonuniform -to multi-socket -directory and broadcastbased -many-cores. We draw a set of observations that, roughly speaking, imply that scalability of synchronization is mainly a property of the hardware.

show abstract

Section: Remote Accessesmentioning

confidence: 99%

Everything you always wanted to know about synchronization but were afraid to ask

David

Guerraoui

Trigonakis

2013

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

197

126

View full text Add to dashboard Cite

show abstract

“…The mutual exclusion methodology therefore, needs to be properly chosen. Though blocking ensures robustness, there is a cost to it and results in very bad outliers [24]. In a multi-core platform, where a CPU can be dedicated to a process to avoid context switching overheads, non-blocking methods such as atomic spinning for a flag is recommended as there is no idle time as soon as the shared resource becomes available [25].…”

Section: Maximizing Parallelizationmentioning

confidence: 99%

Lean and Efficient Business Tier for Performance Scaling

Natarajan¹,

Sarda²,

Srivastava³

2015

IJCA

View full text Add to dashboard Cite

Large mission critical real-time legacy OLTP systems supporting the service sectors like banking, telecom, and financial services are monolithic in nature and thereby not flexible to enable business transformation which is the need of the hour due to emerging dynamic changes to business ecosystem. Broadly speaking, these large mission critical applications can be classified into three stages of activities: pre-processing, core business processing and post processing activities.This paper focuses on making the Business processing activities tier lean and efficient. By leveraging recent advances in technologies, a methodology is described by which the OLTP applications can be successfully transformed into agile systems. The methodology enables a view of the Business Tier in five dimensions. The first dimension is identifying the business"s core critical path and how to make it lean. The second dimension is how to enhance concurrency of the activities in the critical path. The third dimension is to improve the parallelism in execution of concurrency. The fourth is to separate I/O operations off the critical path. The fifth being how to minimize contentions for shared resources to ensure higher efficiencies. The paper also presents, the results of the experiments carried out by applying the above recommendation and the performance improvements to decide the optimal setup for an environment for given workload.Addressing the five dimensions of Business Tier, this paper demonstrates the transformation that can be achieved which will enable the business to be agile and respond to market ecosystem demands in a very efficient and effective manner.

show abstract

“…If too few threads sleep too little the system will remain overloaded; if delays are too long or too common processor utilization will plummet. Prior work has suggested heuristics for tuning the amount of spinning in the system as the thread and processor counts vary [3].…”

Section: Hybrid Spinning/blocking Approachesmentioning

confidence: 99%

“…In between the two extremes, a truly adaptive spinning approach such as the one presented by Boguslavky et al [3] would optimize the trade-off between spinning and blocking to minimize the performance loss, while also applying partial pipelining so context switches occur at least partly off the critical path. In the best case performance drops off gradually from peak until pipelining is effective; in the worst case performance drops off rapidly at first (as happens currently with the spinlocks), then recovers somewhat as pipelining becomes effective.…”

Section: Ideal Spin-then-block Mutexmentioning

confidence: 99%

“…For example, mutex locks in certain operating systems make use of adaptive spinning to avoid the cost of context switches when the wait for a lock 1 is short, leading to performance competitive with spinlocks while avoiding most of the weaknesses of spinning. Other work suggests heuristics for optimizing the duration of spinning based on machine size and workload [3]. Recent research [6] has also addressed partially the negative interaction between spinning and thread preemptions due to OS activity.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A new look at the roles of spinning and blocking

Johnson

Athanassoulis

Stoica

et al. 2009

Proceedings of the Fifth International Workshop on Data Management on New Hardware

View full text Add to dashboard Cite

Database engines face growing scalability challenges as core counts exponentially increase each processor generation, and the efficiency of synchronization primitives used to protect internal data structures is a crucial factor in overall database performance. The trade-offs between different implementation approaches for these primitives shift significantly with increasing degrees of available hardware parallelism. Blocking synchronization, which has long been the favored approach in database systems, becomes increasingly unattractive as growing core counts expose its bottlenecks. Spinning implementations improve peak system throughput by a factor of 2x or more for 64 hardware contexts, but suffer from poor performance under load.In this paper we analyze the shifting trade-off between spinning and blocking synchronization, and observe that the trade-off can be simplified by isolating the load control aspects of contention management and treating the two problems separately: spinning-based contention management and blocking-based load control. We then present a proof of concept implementation that, for high concurrency, matches or exceeds the performance of both user-level spinlocks and the pthread mutex under a wide range of load factors. INTRODUCTIONRecent shifts in computer architecture have resulted in systems containing multiple cores per chip, with core counts projected to double every two years for the foreseeable future. While multicore architectures make available an unprecedented degree of hardware parallelism, they also pose new challenges for database engine design. Increasing the number of concurrent threads puts pressure on internal database engine components and exposes new bottlenecks in the system [7]. Database systems have long depended on blocking synchronization primitives (supplied by the operating system) to manage contention because they are both effective and offer predictable performance over a wide range of system load factors. However, we find that the trade-offs between spinning and blocking primitives shift significantly with increasing degrees of available hardware parallelism. In particular, blocking primitives become unattractive because they result in low system utilization for the high core counts which now appear in commodity servers. By utilizing fully, the machine spinlocks improve performance by 2x or more, but suffer from unstable performance under load.The strengths and weaknesses of spinning and blocking as contention management strategies are well known, and several approaches have been proposed to address the weaknesses or balance their trade-offs. For example, mutex locks in certain operating systems make use of adaptive spinning to avoid the cost of context switches when the wait for a lock 1 is short, leading to performance competitive with spinlocks while avoiding most of the weaknesses of spinning. Other work suggests heuristics for optimizing the duration of spinning based on machine size and workload [3]. Recent research [6] has also addressed partially t...

show abstract

Optimal Strategies for Spinning and Blocking

Cited by 21 publications

References 10 publications

Everything you always wanted to know about synchronization but were afraid to ask

Everything you always wanted to know about synchronization but were afraid to ask

Lean and Efficient Business Tier for Performance Scaling

A new look at the roles of spinning and blocking

Contact Info

Product

Resources

About