We present an on-the-fly mechanism that detects access conflicts in executions of multi-threaded Java programs. Access conflicts are a conservative approximation of data races. The checker tracks access information at the level of objects ( object races ) rather than at the level of individual variables. This viewpoint allows the checker to exploit specific properties of object-oriented programs for optimization by restricting dynamic checks to those objects that are identified by escape analysis as potentially shared. The checker has been implemented in collaboration with an "ahead-of-time"Java compiler. The combination fo static program analysis (escape-analysis) and inline instrumentation during code generation allows us to reduce the runtime overhead of detecting access conflicts. This overhead amounts to about 16-129% in time and less than 25% in space for typical benchmark applications and compares favorably to previously published on-the-fly mechanism that incurred an overhead of about a factor of 2-80 in time and up to a factor of 2 in space.
Tiling has proven to be an effective mechanism to develop high performance implementations of algorithms. Tiling can be used to organize computations so that communication costs in parallel programs are reduced and locality in sequential codes or sequential components of parallel programs is enhanced.In this paper, a data type -Hierarchically Tiled Arrays or HTAs -that facilitates the direct manipulation of tiles is introduced. HTA operations are overloaded array operations. We argue that the implementation of HTAs in sequential OO languages transforms these languages into powerful tools for the development of high-performance parallel codes and codes with high degree of locality. To support this claim, we discuss our experiences with the implementation of HTAs for MATLAB and C++ and the rewriting of the NAS benchmarks and a few other programs into HTA-based parallel form.
A memory model for a concurrent imperative programming language specifies which writes to shared variables may be seen by reads performed by other threads. We present a simple mathematical framework for relaxed memory models for programming languages. To instantiate this framework for a specific language, the designer must choose the notion of atomic steps supported by the language (e.g. 32-bit reads and writes) and specify how a composite step may be broken into a sequence of atomic steps (the decomposition rule). This rule determines which sequence of intermediate writes (if any) are visible to concurrent reads by other threads. Different choices of the rule lead to models which permit a read to return any value if there is a concurrent write (race), or models which satisfy a "No Thin Air Read" property. The former is suitable for languages such as C++ (programs with races have undefined behavior), and the latter for Java. Other intermediate models are possible, useful and interesting.We establish that all models in the framework satisfy the Fundamental Property of relaxed memory models: programs whose sequentially consistent (SC) executions have no races must have have only SC executions. We show how to define synchronization constructs (such as volatiles of various kinds) in the framework, and discuss the causality test cases from the Java Memory Model.
Implicit Parallelism with Ordered Transactions (IPOT) is an extension of sequential or explicitly parallel programming models to support speculative parallelization. The key idea is to specify opportunities for parallelization in a sequential program using annotations similar to transactions. Unlike explicit parallelism, IPOT annotations do not require the absence of data dependence, since the parallelization relies on runtime support for speculative execution. IPOT as a parallel programming model is determinate, i.e., program semantics are independent of the thread scheduling. For optimization, non-determinism can be introduced selectively.We describe the programming model of IPOT and an online tool that recommends boundaries of ordered transactions by observing a sequential execution. On three example HPC workloads we demonstrate that our method is effective in identifying opportunities for fine-grain parallelization. Using the automated task recommendation tool, we were able to perform the parallelization of each program within a few hours.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.