Vincent Loechner scite author profile

Many compiler optimization techniques depend on the ability to calculate the number of elements that satisfy certain conditions. If these conditions can be represented by linear constraints, then such problems are equivalent to counting the number of integer points in (possibly) parametric polytopes.It is well known that the enumerator of such a set can be represented by an explicit function consisting of a set of quasi-polynomials, each associated with a chamber in the parameter space. Previously, interpolation was used to obtain these quasi-polynomials, but this technique has several disadvantages. Its worst-case computation time for a single quasi-polynomial is exponential in the input size, even for fixed dimensions. The worst-case size of such a quasi-polynomial (measured in bits needed to represent the quasi-polynomial) is also exponential in the input size. Under certain conditions this technique even fails to produce a solution.Our main contribution is a novel method for calculating the required quasi-polynomials analytically. It extends an existing method, based on Barvinok's decomposition, for counting the number of integer points in a non-parametric polytope. Our technique always produces a solution and computes polynomially-sized enumerators in polynomial time (for fixed dimensions).

show abstract

Dynamic and Speculative Polyhedral Parallelization Using Compiler-Generated Skeletons

Jimborean

Clauss

Dollinger

et al. 2013

Int J Parallel Prog

View full text Add to dashboard Cite

VMAD: An Advanced Dynamic Program Analysis and Instrumentation Framework

Jimborean

Mastrangelo

Loechner

et al. 2012

View full text Add to dashboard Cite

VMAD (Virtual Machine for Advanced Dynamic analysis) is a platform for advanced profiling and analysis of programs, consisting in a static component and a runtime system. The runtime system is organized as a set of decoupled modules, dedicated to specific instrumenting or optimizing operations, dynamically loaded when required. The program binary files handled by VMAD are previously processed at compile time to include all necessary data, instrumentation instructions and callbacks to the runtime system. For this purpose, the LLVM compiler has been extended to automatically generate multiple versions of the code, each of them tailored for the targeted instrumentation or optimization strategies. The compiler chooses the most suitable intermediate representation for each version, depending on the information to be acquired and on the optimizations to be applied. The control flow graph is adapted to include the new versions and to transfer the control to and from the runtime system, which is in charge of the execution flow orchestration. The strength of our system resides in its extensibility, as one can add support for various new profiling or optimization strategies, independently of the existing modules. VMAD's potential is illustrated by presenting several analysis and optimization applications dedicated to loop nests: instrumentation by sampling, dynamic dependence analysis, adaptive version selection.

show abstract

Untitled

Loechner

Wilde

1997

View full text Add to dashboard Cite

Untitled

Clauss

Loechner

1998

View full text Add to dashboard Cite

Adaptive Runtime Selection for GPU

Dollinger¹,

Loechner²

2013

View full text Add to dashboard Cite

It is often hard to predict the performance of a statically generated code. Hardware availability, hardware specification and problem size may change from one execution context to another. The main contribution of this work is an entirely automatic method aiming to predict execution times of semantically equivalent versions of affine loop nests on GPUs; then, to run the best performing one on GPU or CPU.To make accurate predictions, our framework relies on three consecutive stages: a static code generation, an offline profiling and an online prediction. Different versions are statically generated by PPCG, a source-to-source polyhedral compiler, able to generate CUDA code from static control loops written in C. The code versions differ by their block sizes, tiling and parallel schedule. The profiling code carries out the required measurements on the target machine: throughput between host and device memory, and execution time of the kernels with various parameters. At runtime, we rely on those results to calculate a predicted execution time on GPU. This is followed by a "fastest wins" algorithm, that runs instances of the target code concurrently on CPU and GPU; the first completed kills the other one.We validate this proposal on the polyhedral benchmark suite, showing that the predictions are accurate and that the runtime selection is effective on two different architectures.

show abstract

Adapting the polyhedral model as a framework for efficient speculative parallelization

Jimborean

Clauss

Pradelle

et al. 2012

View full text Add to dashboard Cite

In this paper, we present a Thread-Level Speculation (TLS) framework whose main feature is to be able to speculatively parallelize a sequential loop nest in various ways, by re-scheduling its iterations. The transformation to be applied is selected at runtime with the goal of minimizing the number of rollbacks and maximizing performance. We perform code transformations by applying the polyhedral model that we adapted for speculative and runtime code parallelization. For this purpose, we designed a parallel code pattern which is patched by our runtime system according to the profiling information collected on some execution samples. Adaptability is ensured by considering chunks of code of various sizes, that are launched successively, each of which being parallelized in a different manner, or run sequentially, depending on the currently observed behavior for accessing memory.We show on several benchmarks that our framework yields good performance on codes which could not be handled efficiently by previously proposed TLS systems.

show abstract

Optimization of Triangular and Banded Matrix Operations Using 2d-Packed Layouts

Baroudi

Seghir

Loechner

2017

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

Over the past few years, multicore systems have become increasingly powerful and thereby very useful in high-performance computing. However, many applications, such as some linear algebra algorithms, still cannot take full advantage of these systems. This is mainly due to the shortage of optimization techniques dealing with irregular control structures. In particular, the well-known polyhedral model fails to optimize loop nests whose bounds and/or array references are not affine functions. This is more likely to occur when handling sparse matrices in their packed formats. In this article, we propose using 2d-packed layouts and simple affine transformations to enable optimization of triangular and banded matrix operations. The benefit of our proposal is shown through an experimental study over a set of linear algebra benchmarks.

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.