Data parallel computers and the FORALL statement

Albert, Eugene; Lukas, Joan D.; Steele, Guy L.

doi:10.1109/fmpc.1990.89489

Cited by 3 publications

(4 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The precondition of the for loop, I [1], is shown to be consistent with the precondition and statements 1 -4:…”

Section: 1 Initialisationmentioning

confidence: 61%

“…Data parallel computation is attractive because (i) many scientific applications can be conveniently described and efficiently implemented in the framework; and (ii) it is conceptually simpler than many alternative models (for example, CSP [11] and BSP [15]). Its significance is reflected by the adoption of the array assignment construct in FORTRAN 90 [6] and the FORALL statement [1] in HPF [9]. The goals of this paper are to (i) provide an axiomatic definition of data parallel assignment and (ii) illustrate how the resulting formal rules may be used in correctness proofs.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

An Axiomatic Semantics for Data-Parallel Computation

Stewart

1997

Electronic Workshops in Computing

View full text Add to dashboard Cite

show abstract

“…The precondition of the for loop, I [1], is shown to be consistent with the precondition and statements 1 -4:…”

Section: 1 Initialisationmentioning

confidence: 61%

Section: Introductionmentioning

confidence: 99%

An Axiomatic Semantics for Data-Parallel Computation

Stewart

1997

Electronic Workshops in Computing

View full text Add to dashboard Cite

show abstract

“…For each pixel of the original image (see loop nest in lines 8-9), the program computes a convolution sumX of the 3 × 3 matrix GX and the intensity of the pixel and its eight neighbors (lines [19][20][21][22][23][24]. A similar convolution sumY with the 3 × 3 matrix GY is also computed (lines 25-30).…”

Section: Sobel Edge Filtermentioning

confidence: 99%

“…Our approach is based on the existence of parallelizing transformations designed for each type of diKernel. The procedure is as follows: (1) scalar reduction diKernels are executed as parallel reduction operations (using the reduction OpenMP clause); (2) regular assignment and regular reduction diKernels are converted into forall parallel loops [19]; (3) irregular assignment and irregular reduction diKernels are transformed via an array expansion technique [20,21]; (4) in general, recurrence diKernels cannot be transformed in parallel code, but there exist parallelizing transformations for particular cases [22] (examples will be shown in Section 4). Thus, the critical path is the longest path that only contains diKernel-level flow dependences and parallelizable diKernels.…”

Section: Automatic Partitioning Driven By the Kirmentioning

confidence: 99%

A novel compiler support for automatic parallelization on multicore systems

et al. 2013

View full text Add to dashboard Cite

The widespread use of multicore processors is not a consequence of significant advances in parallel programming.In contrast, multicore processors arise due to the complexity of building power-efficient, high-clock-rate, single-core chips. Automatic parallelization of sequential applications is the ideal solution for making parallel programming as easy as writing programs for sequential computers. However, automatic parallelization remains a grand challenge due to its need for complex program analysis and the existence of unknowns during compilation. This paper proposes a new method for converting a sequential application into a parallel counterpart that can be executed on current multicore processors. It hinges on an intermediate representation based on the concept of domain-independent kernel (e.g., assignment, reduction, recurrence). Such kernel-centric view hides the complexity of the implementation details, enabling the construction of the parallel version even when the source code of the sequential application contains different syntactic variations of the computations (e.g., pointers, arrays, complex control flows). Experiments that evaluate the effectiveness and performance of our approach with respect to state-of-the-art compilers are also presented. The benchmark suite consists of synthetic codes that represent common domain-independent kernels, dense/sparse linear algebra and image processing routines, and full-scale applications from SPEC CPU2000.

show abstract