Optimizing Array Accesses in High Productivity Languages

Joyner, Mackale; Budimlić, Zoran; Sarkar, Vivek

doi:10.1007/978-3-540-75444-2_43

Cited by 3 publications

(3 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One consequence of the point-wise for loop in the X10 version is that (by default) it leads to an allocation of a new point object in every iteration for the index and for all subscript expressions, thereby significantly degrading performance. We previously addressed this problem (for sequential execution only) with a point inlining optimization [13]. However, after applying this transformation, we still experience up to 2 orders of magnitude in performance degradation when comparing Java Grande benchmarks with X10's general high-level arrays against the same benchmarks with lower-level Java arrays.…”

Section: X10 Arraysmentioning

confidence: 99%

“…We also describe an array transformation strategy (Section 3.4), that uses the results from our rank analysis algorithm to convert general X10 arrays into a lower-level, more efficient Java arrays. These two techniques, combined with object inlining of points [5,12,13] result in performance improvements of up to two orders of magnitude. In Section 4, we validate our techniques on a set of parallel Java Grande benchmarks [11].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Array optimizations for parallel implementations of high productivity languages

Joyner

Budimlić

Sarkar

et al. 2008

2008 IEEE International Symposium on Parallel and Distributed Processing

Self Cite

View full text Add to dashboard Cite

DARPA's HPCS program has set a goal of bringing high productivity to high-performance computing. This has resulted in the creation of three new high-level languages, namely Chapel, Fortress and X10, that have successfully addressed one aspect of productivity: programmability. Unfortunately, the current state of the art in implementation of these highlevel language concepts result in significant performance overheads. Our research addresses this issue by concentrating on the second aspect of productivity: performance.This paper presents an interprocedural rank analysis algorithm that is capable of automatically inferring ranks of the arrays in X10, a language that allows rank-independent specification of loop and array computations using regions and points. Further, it uses the rank analysis information to enable storage transformations on arrays; the storage transformation evaluated in this paper converts high-level multidimensional X10 arrays into lower-level multidimensional Java arrays, when legal to do so. We also describe a compiler-to-runtime communication strategy that determines when array bounds checks can be eliminated in high-level X10 loops, and conveys that information to the run-time system, further improving performance. We use a 64-way AIX Power5+ SMP machine to evaluate our optimizations on a set of parallel computational benchmarks and show that they optimize X10 programs with high-level loops using regions, points and rank-free computation to deliver performance that rivals the performance of lower-level, hand-tuned code with explicit loops and array accesses, and up to two orders of magnitude faster than unoptimized, highlevel X10 programs. The experimental results also show that our optimizations help the scalability of X10 programs as well, demonstrating that relative performance improvements

show abstract

Section: X10 Arraysmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Array optimizations for parallel implementations of high productivity languages

Joyner

Budimlić

Sarkar

et al. 2008

2008 IEEE International Symposium on Parallel and Distributed Processing

Self Cite

View full text Add to dashboard Cite

show abstract

“…The first two challenges have been addressed in the past [17,18]. In this paper, we address the optimizing the overhead of array bounds checks by developing a novel regionbased interprocedural array bounds analysis to automatically identify redundant checks.…”

Section: Introductionmentioning

confidence: 99%

Subregion Analysis and Bounds Check Elimination for High Level Arrays

Joyner

Budimlić

Sarkar

2011

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

For decades, the design and implementation of arrays in programming languages has reflected a natural tension between productivity and performance. Recently introduced HPCS languages (Chapel, Fortress and X10) advocate the use of high-level arrays for improved productivity. For example, high-level arrays in the X10 language support rank-independent specification of multidimensional loop and array computations using regions and points. Three aspects of X10 high-level arrays are important for productivity but pose significant performance challenges: high-level accesses are performed through point objects rather than integer indices, variables containing references to arrays are rank-independent, and all subscripts in a high-level array access must be checked for bounds violations. The first two challenges have been addressed in past work. In this paper, we address the third challenge of optimizing the overhead of array bounds checks by developing a novel region-based interprocedural array bounds analysis to automatically identify redundant checks. Elimination of redundant checks reduces the runtime overhead of bounds checks, and also enables further optimization by removing constraints that arise from precise exception semantics. We have implemented an array bounds check elimination algorithm that inserts special annotations that are recognized by a modified JVM. We also introduce array views, a high-level construct that improves productivity by allowing the programmer to access the underlying array through multiple views. We describe a technique for optimizing away the overhead of many common cases of array views in X10. Our experiments show that eliminating bounds checks using the results of the analysis described in this paper improves the performance of our benchmarks by up to 22% over JIT compilation.

show abstract

Program Parallelization Using Synchronized Pipelining

Scandolo

Kunz

Hermenegildo

2010

Logic-Based Program Synthesis and Transformation

View full text Add to dashboard Cite

Abstract. While there are well-understood methods for detecting loops whose iterations are independent and parallelizing them, there are comparatively fewer proposals that support parallel execution of a sequence of loops or nested loops in the case where such loops have dependencies among them. This paper introduces a refined notion of independence, called eventual independence, that in its simplest form considers two loops, say loop 1 and loop 2 , and captures the idea that for every i there exists k such that the i + 1-th iteration of loop 2 is independent from the j-th iteration of loop 1 , for all j ≥ k. Eventual independence provides the foundation of a semantics-preserving program transformation, called synchronized pipelining, that makes execution of consecutive or nested loops parallel, relying on a minimal number of synchronization events to ensure semantics preservation. The practical benefits of synchronized pipelining are demonstrated through experimental results on common algorithms such as sorting and Fourier transforms.

show abstract

Optimizing Array Accesses in High Productivity Languages

Cited by 3 publications

References 15 publications

Array optimizations for parallel implementations of high productivity languages

Array optimizations for parallel implementations of high productivity languages

Subregion Analysis and Bounds Check Elimination for High Level Arrays

Program Parallelization Using Synchronized Pipelining

Contact Info

Product

Resources

About