Lock Coarsening: Eliminating Lock Overhead in Automatically Parallelized Object-Based Programs

Diniz, Pedro C.; Rinard, Martin

doi:10.1006/jpdc.1998.1441

Cited by 35 publications

(26 citation statements)

References 15 publications

(26 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A related problem is automatically optimizing programs with explicit locking by combining multiple locks into one [8]. A key part of this class of work is constructing a mapping from program objects to the locks that protect them, which is similar to, but more specialized than, lock placements.…”

Section: Discussion and Related Workmentioning

confidence: 99%

Concurrent data representation synthesis

Hawkins

Aiken

Fisher

et al. 2012

Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation

View full text Add to dashboard Cite

We describe an approach for synthesizing data representations for concurrent programs. Our compiler takes as input a program written using concurrent relations and synthesizes a representation of the relations as sets of cooperating data structures as well as the placement and acquisition of locks to synchronize concurrent access to those data structures. The resulting code is correct by construction: individual relational operations are implemented correctly and the aggregate set of operations is serializable and deadlock free. The relational specification also permits a high-level optimizer to choose the best performing of many possible legal data representations and locking strategies, which we demonstrate with an experiment autotuning a graph benchmark.

show abstract

Section: Discussion and Related Workmentioning

confidence: 99%

Concurrent data representation synthesis

Hawkins

Aiken

Fisher

et al. 2012

Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation

View full text Add to dashboard Cite

show abstract

“…The compiler includes an automatic lock coarsening algorithm and an automatic replication algorithm [Diniz and Rinard 1998;Rinard and Diniz 1999]. Flags determine the lock coarsening and replication policies that the generated code uses.…”

Section: Methodsmentioning

confidence: 99%

“…In this case, the primary source of overhead is the synchronization overhead associated with executing the lock acquire and release operations. We have attacked this source of overhead by using lock coarsening to increase the granularity at which the computation locks objects [Diniz and Rinard 1998]. We have developed two kinds of lock coarsening: computation lock coarsening and data lock coarsening.…”

Section: Lock Coarsening and Synchronization Overheadmentioning

confidence: 99%

Eliminating synchronization bottlenecks using adaptive replication

Rinard

Diniz

2003

ACM Trans. Program. Lang. Syst.

Self Cite

View full text Add to dashboard Cite

This article presents a new technique, adaptive replication, for automatically eliminating synchronization bottlenecks in multithreaded programs that perform atomic operations on objects. Synchronization bottlenecks occur when multiple threads attempt to concurrently update the same object. It is often possible to eliminate synchronization bottlenecks by replicating objects. Each thread can then update its own local replica without synchronization and without interacting with other threads. When the computation needs to access the original object, it combines the replicas to produce the correct values in the original object. One potential problem is that eagerly replicating all objects may lead to performance degradation and excessive memory consumption.Adaptive replication eliminates unnecessary replication by dynamically detecting contention at each object to find and replicate only those objects that would otherwise cause synchronization bottlenecks. We have implemented adaptive replication in the context of a parallelizing compiler for a subset of C++. Given an unannotated sequential program written in C++, the compiler automatically extracts the concurrency, determines when it is legal to apply adaptive replication, and generates parallel code that uses adaptive replication to efficiently eliminate synchronization bottlenecks.In addition to automatic parallelization and adaptive replication, our compiler also implements a lock coarsening transformation that increases the granularity at which the computation locks objects. The advantage is a reduction in the frequency with which the computation acquires and releases locks; the potential disadvantage is the introduction of new synchronization bottlenecks caused by increases in the sizes of the critical sections. Because the adaptive replication transformation takes place at lock acquisition sites, there is a synergistic interaction between lock coarsening and adaptive replication. Lock coarsening drives down the overhead of using adaptive replication, and adaptive replication eliminates synchronization bottlenecks associated with the overaggressive use of lock coarsening.Our experimental results show that, for our set of benchmark programs, the combination of lock coarsening and adaptive replication can eliminate synchronization bottlenecks and significantly This research was supported in part by NSF grant CCR-9702297. Authors' address: M. C. Rinard, MIT Laboratory for Computer Science, 545 Technology Square, NE43-620A, Cambridge, MA 02139; email: rinard@lcs.mit.edu; P. C. Diniz, USC/ISI, 4676 Admiralty Way, Suite 1001, Marina del Rey, CA 90202; email: pedro@isi.edu. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be h...

show abstract

“…These techniques complement ours. Our local removal algorithm is also different from lock coarsening [7], which optimizes the necessary synchronizations, e.g. those arising from acquiring and releasing a lock multiple times in succession.…”

Section: Related Workmentioning

confidence: 99%

Lock Removal for Concurrent Trace Programs

Kahlon¹,

Wang

2012

Computer Aided Verification

View full text Add to dashboard Cite

Abstract. We propose a trace-based concurrent program analysis to soundly remove redundant synchronizations such as locks while preserving the behaviors of the concurrent computation. Our new method is computationally efficient in that it involves only thread-local computation and therefore avoids interleaving explosion, which is known as the main hurdle for scalable concurrency analysis. Our method builds on the partial-order theory and a unified analysis framework; therefore, it is more generally applicable than existing methods based on simple syntactic rules and ad hoc heuristics. We have implemented and evaluated the proposed method in the context of runtime verification of multithreaded Java and C programs. Our experimental results show that lock removal can significantly speed up symbolic predictive analysis for detecting concurrency bugs. Besides runtime verification, our new method will also be useful in applications such as debugging, performance optimization, program understanding, and maintenance.

show abstract

Lock Coarsening: Eliminating Lock Overhead in Automatically Parallelized Object-Based Programs

Cited by 35 publications

References 15 publications

Concurrent data representation synthesis

Concurrent data representation synthesis

Eliminating synchronization bottlenecks using adaptive replication

Lock Removal for Concurrent Trace Programs

Contact Info

Product

Resources

About