Polyhedral-Model Guided Loop-Nest Auto-Vectorization

Trifunovic, Konrad; Nuzman, Dorit; Cohen, Albert; Zaks, Ayal; Rosen, Ira

doi:10.1109/pact.2009.18

Cited by 92 publications

(44 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this work we automatically apply post-transformations to expose parallelism at the innermost loop level, if possible. Previous work on SIMD vectorization for affine programs has proposed effective solutions to expose innerloop-level parallelism [12,37], and we seamlessly reuse those techniques to enable effective loop pipelining on the FPGA. This is achieved by using additional constraints during and after the tiling hyperplanes computation, to preserve one level of inner parallelism.…”

Section: Loop Pipelining and Task Parallelismmentioning

confidence: 99%

Polyhedral-based data reuse optimization for configurable computing

Pouchet

Zhang

Sadayappan

et al. 2013

Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays

142

108

View full text Add to dashboard Cite

Many applications, such as medical imaging, generate intensive data traffic between the FPGA and off-chip memory. Significant improvements in the execution time can be achieved with effective utilization of on-chip (scratchpad) memories, associated with careful software-based data reuse and communication scheduling techniques.We present a fully automated C-to-FPGA framework to address this problem. Our framework effectively implements data reuse through aggressive loop transformation-based program restructuring. In addition, our proposed framework automatically implements critical optimizations for performance such as task-level parallelization, loop pipelining, and data prefetching. We leverage the power and expressiveness of the polyhedral compilation model to develop a multi-objective optimization system for off-chip communications management. Our technique can satisfy hardware resource constraints (scratchpad size) while aggressively exploiting data reuse. Our approach can also be used to reduce the on-chip buffer size subject to bandwidth constraint. We also implement a fast design space exploration technique for effective optimization of program performance using the Xilinx high-level synthesis tool.

show abstract

Section: Loop Pipelining and Task Parallelismmentioning

confidence: 99%

Polyhedral-based data reuse optimization for configurable computing

Pouchet

Zhang

Sadayappan

et al. 2013

Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays

142

108

View full text Add to dashboard Cite

show abstract

“…Several previous studies have shown how tiling, parallelization, vectorization or data locality enhancement can be efficiently addressed in an affine transformation framework [21], [34], [14], [24], [36]. Any loop transformation can be represented in the polyhedral representation, and composing arbitrarily complex sequences of loop transformations is seamlessly handled by the framework.…”

Section: Optimization Spacementioning

confidence: 99%

“…Our approach to vectorization leverages recent analytical modeling results by Trifunovic et al [36]. We take advantage of the polyhedral representation to restructure imperfectly nested programs, to expose vectorizable inner loops.…”

Section: ) Simd-level Parallelizationmentioning

confidence: 99%

“…The most important part of the transformation to enable vectorization comes from the selection of which parallel loop is moved to the innermost position. The cost model selects a synchronization-free loop that minimizes the memory stride of the data accessed by two contiguous iterations of the loop [36]. Note, this interchange may not always lead to the optimal vectorization, or may simply be useless for a machine which does not support SIMD instruction.…”

Section: ) Simd-level Parallelizationmentioning

confidence: 99%

See 1 more Smart Citation

Predictive Modeling in a Polyhedral Optimization Space

Park

Cavazos²,

Pouchet³

et al. 2013

Int J Parallel Prog

Self Cite

View full text Add to dashboard Cite

Abstract-Significant advances in compiler optimization have been made in recent years, enabling many transformations such as tiling, fusion, parallelization and vectorization on imperfectly nested loops. Nevertheless, the problem of finding the best combination of loop transformations remains a major challenge. Polyhedral models for compiler optimization have demonstrated strong potential for enhancing program performance, in particular for compute-intensive applications. But existing static cost models to optimize polyhedral transformations have significant limitations, and iterative compilation has become a very promising alternative to these models to find the most effective transformations. But since the number of polyhedral optimization alternatives can be enormous, it is often impractical to iterate over a significant fraction of the entire space of polyhedrally transformed variants. Recent research has focused on iterating over this search space either with manually-constructed heuristics or with automatic but very expensive search algorithms (e.g., genetic algorithms) that can eventually find good points in the polyhedral space.In this paper, we propose the use of machine learning to address the problem of selecting the best polyhedral optimizations. We show that these models can quickly find high-performance program variants in the polyhedral space, without resorting to extensive empirical search. We introduce models that take as input a characterization of a program based on its dynamic behavior, and predict the performance of aggressive high-level polyhedral transformations that includes tiling, parallelization and vectorization. We allow for a minimal empirical search on the target machine, discovering on average 83% of the searchspace-optimal combinations in at most 5 runs. Our end-to-end framework is validated using numerous benchmarks on two multi-core platforms.

show abstract

“…[15,26,28,30,43]. These work are usually focusing on the back-end part, that is the actual SIMD code generation from a parallel loop [15,28,30], or on the highlevel loop transformation angle only [12,26,38,40]. To the best of our knowledge, our work is the first to address simultaneously both problems by setting a well-defined interface between a powerful polyhedral high-level transformation engine and a specialized SIMD code generator.…”

Section: Related Workmentioning

confidence: 99%

When polyhedral transformations meet SIMD code generation

et al. 2013

View full text Add to dashboard Cite

Data locality and parallelism are critical optimization objectives for performance on modern multi-core machines. Both coarse-grain parallelism (e.g., multi-core) and fine-grain parallelism (e.g., vector SIMD) must be effectively exploited, but despite decades of progress at both ends, current compiler optimization schemes that attempt to address data locality and both kinds of parallelism often fail at one of the three objectives.We address this problem by proposing a 3-step framework, which aims for integrated data locality, multi-core parallelism and SIMD execution of programs. We define the concept of vectorizable codelets, with properties tailored to achieve effective SIMD code generation for the codelets. We leverage the power of a modern high-level transformation framework to restructure a program to expose good ISA-independent vectorizable codelets, exploiting multi-dimensional data reuse. Then, we generate ISA-specific customized code for the codelets, using a collection of lower-level SIMD-focused optimizations.We demonstrate our approach on a collection of numerical kernels that we automatically tile, parallelize and vectorize, exhibiting significant performance improvements over existing compilers.

show abstract

Polyhedral-Model Guided Loop-Nest Auto-Vectorization

Cited by 92 publications

References 17 publications

Polyhedral-based data reuse optimization for configurable computing

Polyhedral-based data reuse optimization for configurable computing

Predictive Modeling in a Polyhedral Optimization Space

When polyhedral transformations meet SIMD code generation

Contact Info

Product

Resources

About