Performance implications of transient loop-carried data dependences in automatically parallelized loops

Murphy, Niall; Jones, Timothy M.; Mullins, Robert; Campanoni, Simone

doi:10.1145/2892208.2892214

Cited by 18 publications

(18 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Instead, all SCCs are treated equal and merged in the graph to coarsen the granularity of potential parallel regions by applying typed fusion. HELIX [8,30] is a speculatively parallelizing compiler, which would benefit from iterator recognition. While HE-LIX applies parallelizing loop transformations, it relies on normalizable loops (equivalent to while loops), but it does not attempt to separate loop iterator code.…”

Section: Discussionmentioning

confidence: 99%

Generalized profile-guided iterator recognition

Manilov

Vasiladiotis

Franke

2018

Proceedings of the 27th International Conference on Compiler Construction

View full text Add to dashboard Cite

Iterators prescribe the traversal of data structures and determine loop termination, and many loop analyses and transformations require their exact identification. While recognition of iterators is a straight-forward task for affine loops, the situation is different for loops iterating over dynamic data structures or involving control flow dependent computations to determine the next data element. In this paper we propose a compiler analysis for recognizing loop iterators code for a wide class of loops. We initially develop a static analysis, which is then enhanced with profiling information to support speculative code optimizations. We have prototyped our analysis in the LLVM framework and demonstrate its capabilities using the SPEC CPU2006 benchmarks. Our approach is applicable to all loops and we show that we can recognize iterators in, on average, 88.1% of over 75,000 loops using static analysis alone, and up to 94.9% using additional profiling information. Existing techniques perform substantially worse, especially for C and C++ applications, and cover only 35-44% of the loops. Our analysis enables advanced loop optimizations such as decoupled software pipelining, commutativity analysis and source code rejuvenation for real-world applications, which escape analysis and transformation if loop iterators are not recognized accurately.

show abstract

Section: Discussionmentioning

confidence: 99%

Generalized profile-guided iterator recognition

Manilov

Vasiladiotis

Franke

2018

Proceedings of the 27th International Conference on Compiler Construction

View full text Add to dashboard Cite

show abstract

“…HELIX parallelizes a loop by distributing its iterations between cores [23,24,42]. Each iteration is sliced into several sequential and parallel segments.…”

Section: Transformations Built Upon Noellementioning

confidence: 99%

NOELLE Offers Empowering LLVM Extensions

Matni¹,

Deiana²,

Su³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Modern and emerging architectures demand increasingly complex compiler analyses and transformations. As the emphasis on compiler infrastructure moves beyond support for peephole optimizations and the extraction of instructionlevel parallelism, they should support custom tools designed to meet these demands with higher-level analysis-powered abstractions of wider program scope. This paper introduces NOELLE, a robust open-source domain-independent compilation layer built upon LLVM providing this support. NOELLE is modular and demand-driven, making it easy-to-extend and adaptable to custom-tool-specific needs without unduly wasting compile time and memory. This paper shows the power of NOELLE by presenting a diverse set of ten custom tools built upon it, with a 33.2% to 99.2% reduction in code size (LoC) compared to their counterparts without NOELLE.

show abstract

“…On the other hand, recent work has shown that dependence analysis, even when informed with perfect profiling information, is inherently unable to identify any further latent parallelism [25].…”

Section: Introductionmentioning

confidence: 99%

Loop Parallelization using Dynamic Commutativity Analysis

Vasiladiotis

Lozano

Cole

et al. 2021

2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

View full text Add to dashboard Cite

Automatic parallelization has largely failed to keep its promise of extracting parallelism from sequential legacy code to maximize performance on multi-core systems outside the numerical domain. In this paper, we develop a novel dynamic commutativity analysis (DCA) for identifying parallelizable loops. Using commutativity instead of dependence tests, DCA avoids many of the overly strict data dependence constraints limiting existing parallelizing compilers. DCA extends the scope of automatic parallelization to uniformly include both regular arraybased and irregular pointer-based codes. We have prototyped our novel parallelism detection analysis and evaluated it extensively against five state-of-the-art dependence-based techniques in two experimental settings. First, when applied to the NAS benchmarks which contain almost 1400 loops, DCA is able to identify as many parallel loops (over 1200) as the profile-guided dependence techniques and almost twice as many as all the static techniques combined. We then apply DCA to complex pointer-based loops, where it can successfully detect parallelism, while existing techniques fail to identify any. When combined with existing parallel code generation techniques, this results in an average speedup of 3.6× (and up to 55×) across the NAS benchmarks on a 72-core host, and up to 36.9× for the pointer-based loops, demonstrating the effectiveness of DCA in identifying profitable parallelism across a wide range of loops.

show abstract

Performance implications of transient loop-carried data dependences in automatically parallelized loops

Cited by 18 publications

References 45 publications

Generalized profile-guided iterator recognition

Generalized profile-guided iterator recognition

NOELLE Offers Empowering LLVM Extensions

Loop Parallelization using Dynamic Commutativity Analysis

Contact Info

Product

Resources

About