A model for fusion and code motion in an automatic parallelizing compiler

Bondhugula, Uday; Günlük, Oktay; Dash, Sanjeeb; Renganarayanan, Lakshminarayanan

doi:10.1145/1854273.1854317

Cited by 43 publications

(21 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, the proof for Theorem 5 also implies a loop-carried input for a particular variable access is created on a loop only when all the accesses to a variable inside the loop-nest are read accesses. Therefore, a flow path from a loopinvariant source vertex to a loop-carried input never exists, ensuring characteristic (6).…”

Section: Synthesizing a Dataflow Diagrammentioning

confidence: 99%

See 1 more Smart Citation

PolyGLoT: A Polyhedral Loop Transformation Framework for a Graphical Dataflow Language

Bhaskaracharya

Bondhugula

2013

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Abstract. Polyhedral techniques for program transformation are now used in several proprietary and open source compilers. However, most of the research on polyhedral compilation has focused on imperative languages such as C, where the computation is specified in terms of statements with zero or more nested loops and other control structures around them. Graphical dataflow languages, where there is no notion of statements or a schedule specifying their relative execution order, have so far not been studied using a powerful transformation or optimization approach. The execution semantics and referential transparency of dataflow languages impose a different set of challenges. In this paper, we attempt to bridge this gap by presenting techniques that can be used to extract polyhedral representation from dataflow programs and to synthesize them from their equivalent polyhedral representation. We then describe PolyGLoT, a framework for automatic transformation of dataflow programs which we built using our techniques and other popular research tools such as Clan and Pluto. For the purpose of experimental evaluation, we used our tools to compile LabVIEW, one of the most widely used dataflow programming languages. Results show that dataflow programs transformed using our framework are able to outperform those compiled otherwise by up to a factor of seventeen, with a mean speed-up of 2.30× while running on an 8-core Intel system.

show abstract

Section: Synthesizing a Dataflow Diagrammentioning

confidence: 99%

“…There are many compilers, both proprietary and open-source which now use the polyhedral compiler framework [12,15,6,16]. Research in this area, however, has predominantly focused on imperative languages such as C, C++, and Fortran.…”

Section: Introductionmentioning

confidence: 99%

PolyGLoT: A Polyhedral Loop Transformation Framework for a Graphical Dataflow Language

Bhaskaracharya

Bondhugula

2013

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…The work in the first category focuses on improving data locality and date reuse by code transformation, especially loop transformation [2][3][4][5][6][7][8][9][10][11]. By changing the accessing order of array references in the loop nests, the co-located references become temporally "closer", which means a smaller data reuse buffer.…”

Section: Introductionmentioning

confidence: 99%

“…To improve data locality, iteration distances between dependent array instances are formulated in the objective function. Beside affine transformation, loop fusion/distribution, code motion and tiling for imperfectly nests loops have also been studied in recent work [8][9][10][11]. However, these models [2][3][4][5][6][7][8][9][10][11] use simple platform-independent objective functions, which cannot accurately model the impact of memory hierarchy allocation in hardware synthesis.…”

Section: Introductionmentioning

confidence: 99%

Optimizing memory hierarchy allocation with loop transformations for high-level synthesis

Cong

Zhang

Zou

2012

Proceedings of the 49th Annual Design Automation Conference

View full text Add to dashboard Cite

For the majority of computation-intensive application systems, off-chip memory bandwidth is a critical bottleneck for both performance and power consumption. The efficient utilization of limited on-chip memory resources plays a vital role in reducing the off-chip memory accesses. This paper presents an efficient approach for optimizing the on-chip memory allocation by loop transformations in the imperfectly nested loops. We analytically model the on-chip buffer size and off-chip bandwidth after affine loop transformation, loop fusion/distribution and code motion. Branch-and-bound and knapsack reuse techniques are proposed to reduce the computation complexity in finding optimal solutions. Experimental results show that our scheme can save 40% of onchip memory size with the same bandwidth consumption compared to the previous approaches.

show abstract

“…Automatic parallelization and locality optimization of affine loop nests have been addressed for shared-memory multiprocessors and GPUs with good success [4], [7], [8], [16], [29], [30]. However, many large-scale simulation applications must be executed in a distributed-memory environment, using irregular or sparse computations where the control-flow and array-access patterns are data-dependent.…”

Section: Introductionmentioning

confidence: 99%

Code generation for parallel execution of a class of irregular loops on distributed memory systems

Ravishankar¹,

Eisenlohr²,

Pouchet³

et al. 2012

2012 International Conference for High Performance Computing, Networking, Storage and Analysis

View full text Add to dashboard Cite

Abstract-Parallelization and locality optimization of affine loop nests has been successfully addressed for shared-memory machines. However, many large-scale simulation applications must be executed in a distributed-memory environment, and use irregular/sparse computations where the control-flow and arrayaccess patterns are data-dependent.In this paper, we propose an approach for effective parallel execution of a class of irregular loop computations in a distributedmemory environment, using a combination of static and runtime analysis. We discuss algorithms that analyze sequential code to generate an inspector and an executor. The inspector captures the data-dependent behavior of the computation in parallel and without requiring complete replication of any of the data structures used in the original computation. The executor performs the computation in parallel. The effectiveness of the framework is demonstrated on several benchmarks and a climate modeling application.

show abstract

A model for fusion and code motion in an automatic parallelizing compiler

Cited by 43 publications

References 20 publications

PolyGLoT: A Polyhedral Loop Transformation Framework for a Graphical Dataflow Language

PolyGLoT: A Polyhedral Loop Transformation Framework for a Graphical Dataflow Language

Optimizing memory hierarchy allocation with loop transformations for high-level synthesis

Code generation for parallel execution of a class of irregular loops on distributed memory systems

Contact Info

Product

Resources

About