Database engines face growing scalability challenges as core counts exponentially increase each processor generation, and the efficiency of synchronization primitives used to protect internal data structures is a crucial factor in overall database performance. The trade-offs between different implementation approaches for these primitives shift significantly with increasing degrees of available hardware parallelism. Blocking synchronization, which has long been the favored approach in database systems, becomes increasingly unattractive as growing core counts expose its bottlenecks. Spinning implementations improve peak system throughput by a factor of 2x or more for 64 hardware contexts, but suffer from poor performance under load.In this paper we analyze the shifting trade-off between spinning and blocking synchronization, and observe that the trade-off can be simplified by isolating the load control aspects of contention management and treating the two problems separately: spinning-based contention management and blocking-based load control. We then present a proof of concept implementation that, for high concurrency, matches or exceeds the performance of both user-level spinlocks and the pthread mutex under a wide range of load factors.
INTRODUCTIONRecent shifts in computer architecture have resulted in systems containing multiple cores per chip, with core counts projected to double every two years for the foreseeable future. While multicore architectures make available an unprecedented degree of hardware parallelism, they also pose new challenges for database engine design. Increasing the number of concurrent threads puts pressure on internal database engine components and exposes new bottlenecks in the system [7]. Database systems have long depended on blocking synchronization primitives (supplied by the operating system) to manage contention because they are both effective and offer predictable performance over a wide range of system load factors. However, we find that the trade-offs between spinning and blocking primitives shift significantly with increasing degrees of available hardware parallelism. In particular, blocking primitives become unattractive because they result in low system utilization for the high core counts which now appear in commodity servers. By utilizing fully, the machine spinlocks improve performance by 2x or more, but suffer from unstable performance under load.The strengths and weaknesses of spinning and blocking as contention management strategies are well known, and several approaches have been proposed to address the weaknesses or balance their trade-offs. For example, mutex locks in certain operating systems make use of adaptive spinning to avoid the cost of context switches when the wait for a lock 1 is short, leading to performance competitive with spinlocks while avoiding most of the weaknesses of spinning. Other work suggests heuristics for optimizing the duration of spinning based on machine size and workload [3]. Recent research [6] has also addressed partially t...