Fast condensation of the program dependence graph

Johnson, Nick; Oh, Taewook; Zaks, Ayal; August, David I.

doi:10.1145/2499370.2491960

Cited by 2 publications

(2 citation statements)

References 45 publications

(32 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…dependence analysis [10,15,18,19,23]. Previous work has shown empirically that an improved dependence analysis can enhance the performance of automatic parallelization [27].…”

Section: Limits Of Dependence Analysismentioning

confidence: 99%

Performance implications of transient loop-carried data dependences in automatically parallelized loops

Murphy

Jones

Mullins

et al. 2016

Proceedings of the 25th International Conference on Compiler Construction

View full text Add to dashboard Cite

Recent approaches to automatic parallelization have taken advantage of the low-latency on-chip interconnect provided in modern multicore processors, demonstrating significant speedups, even for complex workloads. Although these techniques can already extract significant thread-level parallelism from application loops, we are interested in quantifying and exploiting any additional performance that remains on the table. This paper confirms the existence of significant extra threadlevel parallelism within loops parallelized by the HELIX compiler. However, improving static data dependence analysis is unable to reach the additional performance offered because the existing loopcarried dependences are true only on a small subset of loop iterations. We therefore develop three approaches to take advantage of the transient nature of these data dependences through speculation, via transactional memory support. Results show that coupling the state-of-the-art data dependence analysis with fine-grained speculation achieves most of the speedups and may help close the gap towards the limit of HELIX-style thread-level parallelism.

show abstract

“…dependence analysis [10,15,18,19,23]. Previous work has shown empirically that an improved dependence analysis can enhance the performance of automatic parallelization [27].…”

Section: Limits Of Dependence Analysismentioning

confidence: 99%

Performance implications of transient loop-carried data dependences in automatically parallelized loops

Murphy

Jones

Mullins

et al. 2016

Proceedings of the 25th International Conference on Compiler Construction

View full text Add to dashboard Cite

show abstract

“…c 2016 ACM 1544-3566/2016/08-ART23 $15.00 DOI: http://dx.doi.org/10.1145/2963101 One key reason for this shortfall is that most existing production compilers built on traditional wisdom often limit themselves to optimizing only small scopes in the entire program [Rauchwerger and Padua 1999;Ding and Kennedy 2004;Vandierendonck et al 2010] or kernels [Bondhugula et al 2008;Pouchet et al , 2008]. Even in some recent work, such as Johnson et al [2013], the authors only analyze individual hot loops in target applications to mark those loops as parallel or determine the validity of loop distribution. However, our experiments with several scientific applications from the SPEC benchmark suite reveal that there are many opportunities for improvement in memory (and parallel) performance of those benchmarks through global program optimizations (or transformations) such as applying loop fusion across a sequence of such hot loops.…”

Section: Introductionmentioning

confidence: 99%

Variable Liberalization

Mehta

Yew

2016

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

In the wake of the current trend of increasing the number of cores on a chip, compiler optimizations for improving the memory performance have assumed increased importance. Loop fusion is one such key optimization that can alleviate memory and bandwidth wall and thus improve parallel performance. However, we find that loop fusion in interesting memory-intensive applications is prevented by the existence of dependences between temporary variables that appear in different loop nests. Furthermore, known techniques of allowing useful transformations in the presence of temporary variables, such as privatization and expansion, prove insufficient in such cases.In this work, we introduce variable liberalization, a technique that selectively removes dependences on temporary variables in different loop nests to achieve loop fusion while preserving the semantical correctness of the optimized program. This removal of extra-stringent dependences effectively amounts to variable expansion, thus achieving the benefit of an increased degree of freedom for program transformation but without an actual expansion. Hence, there is no corresponding increase in the memory footprint incurred. We implement liberalization in the Pluto polyhedral compiler and evaluate its performance on nine hot regions in five real applications. Results demonstrate parallel performance improvement of 1.92× over the Intel compiler, averaged over the nine hot regions, and an overall improvement of as much as 2.17× for an entire application, on an eight-core Intel Xeon processor.

show abstract

Fast condensation of the program dependence graph

Cited by 2 publications

References 45 publications

Performance implications of transient loop-carried data dependences in automatically parallelized loops

Performance implications of transient loop-carried data dependences in automatically parallelized loops

Variable Liberalization

Contact Info

Product

Resources

About