For decades, the design and implementation of arrays in programming languages has reflected a natural tension between productivity and performance. Recently introduced HPCS languages (Chapel, Fortress and X10) advocate the use of high-level arrays for improved productivity. For example, high-level arrays in the X10 language support rank-independent specification of multidimensional loop and array computations using regions and points. Three aspects of X10 high-level arrays are important for productivity but pose significant performance challenges: high-level accesses are performed through point objects rather than integer indices, variables containing references to arrays are rank-independent, and all subscripts in a high-level array access must be checked for bounds violations. The first two challenges have been addressed in past work. In this paper, we address the third challenge of optimizing the overhead of array bounds checks by developing a novel region-based interprocedural array bounds analysis to automatically identify redundant checks. Elimination of redundant checks reduces the runtime overhead of bounds checks, and also enables further optimization by removing constraints that arise from precise exception semantics. We have implemented an array bounds check elimination algorithm that inserts special annotations that are recognized by a modified JVM. We also introduce array views, a high-level construct that improves productivity by allowing the programmer to access the underlying array through multiple views. We describe a technique for optimizing away the overhead of many common cases of array views in X10. Our experiments show that eliminating bounds checks using the results of the analysis described in this paper improves the performance of our benchmarks by up to 22% over JIT compilation.
DARPA's HPCS program has set a goal of bringing high productivity to high-performance computing. This has resulted in the creation of three new high-level languages, namely Chapel, Fortress and X10, that have successfully addressed one aspect of productivity: programmability. Unfortunately, the current state of the art in implementation of these highlevel language concepts result in significant performance overheads. Our research addresses this issue by concentrating on the second aspect of productivity: performance.This paper presents an interprocedural rank analysis algorithm that is capable of automatically inferring ranks of the arrays in X10, a language that allows rank-independent specification of loop and array computations using regions and points. Further, it uses the rank analysis information to enable storage transformations on arrays; the storage transformation evaluated in this paper converts high-level multidimensional X10 arrays into lower-level multidimensional Java arrays, when legal to do so. We also describe a compiler-to-runtime communication strategy that determines when array bounds checks can be eliminated in high-level X10 loops, and conveys that information to the run-time system, further improving performance. We use a 64-way AIX Power5+ SMP machine to evaluate our optimizations on a set of parallel computational benchmarks and show that they optimize X10 programs with high-level loops using regions, points and rank-free computation to deliver performance that rivals the performance of lower-level, hand-tuned code with explicit loops and array accesses, and up to two orders of magnitude faster than unoptimized, highlevel X10 programs. The experimental results also show that our optimizations help the scalability of X10 programs as well, demonstrating that relative performance improvements
Abstract. One of the outcomes of DARPA's HPCS program has been the creation of three new high productivity languages: Chapel, Fortress, and X10. While these languages have introduced improvements in language expressiveness and programmer productivity, several technical challenges still remain in delivering high performance with these languages. In the absence of optimization, the high-level language constructs that improve productivity can result in order-of-magnitude runtime performance degradations. This paper addresses the problem of efficient code generation for high level array accesses in the X10 language. Two aspects of high level array accesses in X10 are important for productivity but also pose significant performance challenges: the high level accesses are performed through Point objects rather than integer indices, and variables containing references to arrays are rank-independent. Our solution to the first challenge is to extend the X10 compiler with automatic inlining and scalar replacement of Point objects. Our partial solution to the second challenge is to use X10's dependent type system to enable the programmer to annotate array variable declarations with additional information for the rank and region of the variable, and to allow the compiler to generate efficient code in cases where the dependent type information is available. Although this paper focuses on high level array accesses in X10, our approach is applicable to similar constructs in other languages. Our experimental results for single-thread performance demonstrate that these compiler optimizations can enable high-level X10 array accesses with implicit ranks and Points to improve performance by up to a factor of 5.4× over unoptimized X10 code, and to also achieve performance comparable (from 48% to 100%) to that of lower-level Java programs. These results underscore the importance of the optimization techniques presented in this paper for achieving high performance with high productivity.
Java is a high productivity object-oriented programming language that is rapidly gaining popularity in high-performance application development. One major obstacle to its broad acceptance is its mediocre performance when compared with Fortran or C, especially if the developers use object-oriented features of the language extensively. Previous work in improving the performance of object-oriented, high-performance, scientific Java applications consisted of high level compiler optimization and analysis strategies, such as class specialization and object inlining. This paper extends prior work on object inlining by improving the analysis and developing new code transformation techniques to further improve the performance of high performance applications written in high-productivity, object-oriented style. Two major impediments to effective object inlining are object and array aliasing and binary method invocations. This paper implements object and array alias strategies to address the aliasing problem while utilizing an idea from Telescoping Languages to address the binary method invocation problem. Application runtime gains of up to 20% result from employing these techniques. These improvements should further increase the scientific community's acceptance of the Java programming language in the development of high-performance, high-productivity, scientific applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.