Logical inference techniques for loop parallelization

Oancea, Cosmin E.; Rauchwerger, Lawrence

doi:10.1145/2254064.2254124

Cited by 25 publications

(8 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Other compile-time and run-time techniques use as much compile-time information as possible to generate efficient runtime checks to determine if a loop is fully parallelizable [48], [49]. Loops that only admit wavefront parallelization will be determined not parallelizable by these approaches, but the inspector overhead is significantly reduced.…”

Section: A Compiler-based Approachesmentioning

confidence: 99%

Automating Wavefront Parallelization for Sparse Matrix Computations

Venkat¹,

Mohammadi²,

Park³

et al. 2016

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

View full text Add to dashboard Cite

This paper presents a compiler and runtime framework for parallelizing sparse matrix computations that have loopcarried dependences. Our approach automatically generates a runtime inspector to collect data dependence information and achieves wavefront parallelization of the computation, where iterations within a wavefront execute in parallel, and synchronization is required across wavefronts. A key contribution of this paper involves dependence simplification, which reduces the time and space overhead of the inspector. This is implemented within a polyhedral compiler framework, extended for sparse matrix codes. Results demonstrate the feasibility of using automaticallygenerated inspectors and executors to optimize ILU factorization and symmetric Gauss-Seidel relaxations, which are part of the Preconditioned Conjugate Gradient (PCG) computation. Our implementation achieves a median speedup of 2.97× on 12 cores over the reference sequential PCG implementation, significantly outperforms PCG parallelized using Intel's Math Kernel Library (MKL), and is within 6% of the median performance of manually-parallelized PCG.

show abstract

Section: A Compiler-based Approachesmentioning

confidence: 99%

Automating Wavefront Parallelization for Sparse Matrix Computations

Venkat¹,

Mohammadi²,

Park³

et al. 2016

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

View full text Add to dashboard Cite

show abstract

“…Other work summarizes at program level array accesses, either via systems of affine inequations or as a program in a language of abstract sets, and summaries are paired with predicates that are evaluated at runtime to minimize overheads [10,16]. However, providing support for arbitrary predicates and summaries, requires in the imperative context many helper intermediate representations and even an entire optimization infrastructure for these new languages, which has been informally characterized as "heroic effort".…”

Section: Related Workmentioning

confidence: 99%

“…Finally, our technique is an instance of hybrid analysis, which denotes a class of transformations that extract statically datasensitive invariants from the program and aggressively specializes the program based on the result of the runtime evaluation of those invariants. Such analyses, reviewed in Section 4, include optimization of the common-execution path in JIT compilation [1], inspector-executor and dependence analysis of array subscripts in automatic parallelization [10,[16][17][18][19]. A significant problem however is that, in the imperative context, supporting anything but the simplest, O(1) predicate quickly requires "heroic" efforts.…”

Section: Introductionmentioning

confidence: 99%

Bounds Checking

Henriksen

Oancea

2014

Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming

Self Cite

View full text Add to dashboard Cite

This paper presents an analysis for bounds checking of array subscripts that lifts checking assertions to program level under the form of an arbitrarily-complex predicate (inspector), whose runtime evaluation guards the execution of the code of interest. Separating the predicate from the computation makes it more amenable to optimization, and allows it to be split into a cascade of sufficient conditions of increasing complexity that optimizes the commoninspection path. While synthesizing the bounds checking invariant resembles type checking techniques, we rely on compiler simplification and runtime evaluation rather than employing complex inference and annotation systems that might discourage the nonspecialist user. We integrate the analysis in the compiler's repertoire of Futhark: a purely-functional core language supporting mapreduce nested parallelism on regular arrays, and show how the highlevel language invariants enable a relatively straightforward analysis. Finally, we report a qualitative evaluation of our technique on three real-world applications from the financial domain that indicates that the runtime overhead of predicates is negligible.

show abstract

“…The emergence of commodity multi-core, cache-coherent systems in mid 2000 has fostered the study (i) of software-transactional memories [13] (STM) as a way to provide a clean, progressguaranteed semantics for atomic operations, (ii) of a variety of algorithms and transformations [32,37] that were aimed at enhancing the locality of reference in both space and time, and (iii) http://dx.doi.org/10.1145/2636228.2636238 of a range of analyses from entirely dynamic [11,29,34] to entirely static for automatic parallelization [20,30,33]. While these techniques are important and ideas can be reused, such solutions do not naturally extend to commodity (massively parallel) manycore architectures, such as, GPGPUs, because they (i) either rely on a fast and coherent cache infrastructure, (ii) exhibit memory overhead proportional to the number of cores, or (iii) do not extend beyond one-loop parallelization and do not guarantee that all available parallelism is detected.…”

Section: Introductionmentioning

confidence: 99%

Size slicing

Henriksen

Elsman

Oancea

2014

Proceedings of the 3rd ACM SIGPLAN Workshop on Functional High-Performance Computing

Self Cite

View full text Add to dashboard Cite

We present a shape inference analysis for a purely-functional language, named Futhark, that supports nested parallelism via array combinators such as map, reduce, filter, and scan. Our approach is to infer code for computing precise shape information at run-time, which in the most common cases can be effectively optimized by standard compiler optimizations. Instead of restricting the language or sacrificing ease of use, the language allows the occasional shape-dynamic, and even shape-misbehaving, constructs. Inherently shape-dynamic code is treated with a fall-back technique that preserves, asymptotically, the number of operations of the program and that computes and returns the array's shape alongside with its value. This approach leads to a shape-dependent system with existentially-quantified types, where static shape inference corresponds to eliminating existential quantifications from the types of program expressions.We optimize the common case to negligible overhead via size slicing: a technique that separates the computation of the array's shape from its values. This allows the shape to be calculated in advance and to be used to instantiate the previously existentiallyquantified shapes of the value slice. We report negligible overhead, on several mini-benchmarks and three real-world applications.

show abstract

Logical inference techniques for loop parallelization

Cited by 25 publications

References 27 publications

Automating Wavefront Parallelization for Sparse Matrix Computations

Automating Wavefront Parallelization for Sparse Matrix Computations

Bounds Checking

Size slicing

Contact Info

Product

Resources

About