Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation 2012
DOI: 10.1145/2254064.2254124
|View full text |Cite
|
Sign up to set email alerts
|

Logical inference techniques for loop parallelization

Abstract: This paper presents a fully automatic approach to loop parallelization that integrates the use of static and run-time analysis and thus overcomes many known difficulties such as nonlinear and indirect array indexing and complex control flow. Our hybrid analysis framework validates the parallelization transformation by verifying the independence of the loop's memory references. To this end it represents array references using the USR (uniform set representation) language and expresses the independence condition… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2013
2013
2019
2019

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 25 publications
(8 citation statements)
references
References 27 publications
0
8
0
Order By: Relevance
“…Other compile-time and run-time techniques use as much compile-time information as possible to generate efficient runtime checks to determine if a loop is fully parallelizable [48], [49]. Loops that only admit wavefront parallelization will be determined not parallelizable by these approaches, but the inspector overhead is significantly reduced.…”
Section: A Compiler-based Approachesmentioning
confidence: 99%
“…Other compile-time and run-time techniques use as much compile-time information as possible to generate efficient runtime checks to determine if a loop is fully parallelizable [48], [49]. Loops that only admit wavefront parallelization will be determined not parallelizable by these approaches, but the inspector overhead is significantly reduced.…”
Section: A Compiler-based Approachesmentioning
confidence: 99%
“…Other work summarizes at program level array accesses, either via systems of affine inequations or as a program in a language of abstract sets, and summaries are paired with predicates that are evaluated at runtime to minimize overheads [10,16]. However, providing support for arbitrary predicates and summaries, requires in the imperative context many helper intermediate representations and even an entire optimization infrastructure for these new languages, which has been informally characterized as "heroic effort".…”
Section: Related Workmentioning
confidence: 99%
“…Finally, our technique is an instance of hybrid analysis, which denotes a class of transformations that extract statically datasensitive invariants from the program and aggressively specializes the program based on the result of the runtime evaluation of those invariants. Such analyses, reviewed in Section 4, include optimization of the common-execution path in JIT compilation [1], inspector-executor and dependence analysis of array subscripts in automatic parallelization [10,[16][17][18][19]. A significant problem however is that, in the imperative context, supporting anything but the simplest, O(1) predicate quickly requires "heroic" efforts.…”
Section: Introductionmentioning
confidence: 99%
“…The emergence of commodity multi-core, cache-coherent systems in mid 2000 has fostered the study (i) of software-transactional memories [13] (STM) as a way to provide a clean, progressguaranteed semantics for atomic operations, (ii) of a variety of algorithms and transformations [32,37] that were aimed at enhancing the locality of reference in both space and time, and (iii) http://dx.doi.org/10.1145/2636228.2636238 of a range of analyses from entirely dynamic [11,29,34] to entirely static for automatic parallelization [20,30,33]. While these techniques are important and ideas can be reused, such solutions do not naturally extend to commodity (massively parallel) manycore architectures, such as, GPGPUs, because they (i) either rely on a fast and coherent cache infrastructure, (ii) exhibit memory overhead proportional to the number of cores, or (iii) do not extend beyond one-loop parallelization and do not guarantee that all available parallelism is detected.…”
Section: Introductionmentioning
confidence: 99%