We present algorithms for shrinking and expanding a hash table while allowing concurrent, wait-free, linearly scalable lookups. These resize algorithms allow ReadCopy Update (RCU) hash tables to maintain constanttime performance as the number of entries grows, and reclaim memory as the number of entries decreases, without delaying or disrupting readers. We call the resulting data structure a relativistic hash table.Benchmarks of relativistic hash tables in the Linux kernel show that lookup scalability during resize improves 125x over reader-writer locking, and 56% over Linux's current state of the art. Relativistic hash lookups experience no performance degradation during a resize. Applying this algorithm to memcached removes a scalability limit for get requests, allowing memcached to scale linearly and service up to 46% more requests per second.Relativistic hash tables demonstrate the promise of a new concurrent programming methodology known as relativistic programming. Relativistic programming makes novel use of existing RCU synchronization primitives, namely the wait-for-readers operation that waits for unfinished readers to complete. This operation, conventionally used to handle reclamation, here allows ordering of updates without read-side synchronization or memory barriers.
Read-copy update (RCU) is a synchronization mechanism in the Linux TM kernel that provides significant improvements in multiprocessor scalability by eliminating the writer-delay problem of readers-writer locking. RCU implementations to date, however, have had the side effect of expanding non-preemptible regions of code, thereby degrading real-time response. We present here a variant of RCU that allows preemption of read-side critical sections and thus is better suited for real-time applications. We summarize priority-inversion issues with locking, present an overview of the RCU mechanism, discuss our counter-based adaptation of RCU for real-time use, describe an additional adaptation of RCU that permits general blocking in readside critical sections, and present performance results. We also discuss an approach for replacing the readers-writer synchronization with RCU in existing implementations. INTRODUCTIONIn this paper we focus on environments in which real-time applications are running on shared-memory multiprocessor systems with the Linux** operating system. Such environments require both realtime response and multiprocessor scalability. Realtime response means that the hardware and the operating system perform within real-time constraints; that is, the response times to certain events are subject to operational deadlines. Multiprocessor scalability means that the system can process growing amounts of work when the level of multiprocessing is proportionally increased. Tech-
High-performance programs and systems require concurrency to take full advantage of available hardware. However, the available concurrent programming models force a difficult choice, between simple models such as mutual exclusion that produce little to no concurrency, or complex models such as Read-Copy Update that can scale to all available resources.Simple concurrent programming models enforce atomicity and causality, and this enforcement limits concurrency. Scalable concurrent programming models expose the weakly ordered hardware memory model, requiring careful and explicit enforcement of causality to preserve correctness, as demonstrated in this dissertation through the manual construction of a scalable hash-table item-move algorithm. Recent research on relativistic programming aims to standardize the programming model of Read-Copy Update, but thus far these efforts have lacked a generalized memory ordering model, requiring datastructure-specific reasoning to preserve causality. To demonstrate the relativistic causal ordering model, I walk through the straightforward construction of a novel concurrent hash-table resize algorithm, including the translation of this algorithm from the relativistic model to a hardware memory model, and show through benchmarks that the resulting algorithm scales far better than those based on mutual exclusion.ii
The advent of multi-core and multi-threaded processor architectures highlights the need to address the well-known shortcomings of the ubiquitous lock-based synchronization mechanisms. To this end, transactional memory has been viewed by many as a promising alternative to locking. This paper therefore presents a constructive critique of locking and transactional memory: their strengths, weaknesses, and opportunities for improvement.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.