In this paper, we present the design and implementation of an inter-procedural loop fusion, array contraction and rotation technique in a production compiler. We provide experimental results to show that this technique improves SPECfp2000 benchmarks by 12%. The technique employs a locality-conscious inter-procedural analysis to drive inlining decisions. It then uses regular section analysis and code motion techniques to enable loop fusion across procedure boundaries. We discuss the implementation of data promotion and array contraction techniques. We introduce array rotation technique to eliminate the overhead of copying array sections.
In this paper, we present a unimodular loop transformation called rotation as a simple, systematic and uniform method for partitioning the iteration spaces of doubly nested loops for execution on distributed memory multiprocessors. We define three parameters which could be used to choose an optimal rotation. These parameters are the parallelism factor, the !oad imbalance and the volume of communication. We present algebraic expressions for these parameters and discuss their relative significance in choosing a combined metric.
We present a computationally efficient method for deriving the most appropriate transformation and mapping of a nested loop for a given hierarchical parallel machine. This method is in the context of our systematic and general theory of unimodular loop transformations for the problem of iteration space partitioning [7]. Finding an optimal mapping or an optimal associated unimodular transformation is NP-complete. We present a polynomial time method for obtaining a 'good' transformation using a simple parameterized model of the hierarchical machine. We outline a systematic methodology for obtaining the most appropriate mapping.
The High-Level Optimizer (HLO) is a key part of the compiler technology that enabled Itanium TM and Itanium TM 2 processors deliver leading floating-point performance at their introduction. In this paper, we discuss the design and implementation experience in integrating diverse optimizations in the HLO module. In particular, we describe decisions made in the design of HLO targeting Itanium processor family. We provide empirical data to validate the design decisions. Since HLO was implemented in a production compiler, we made certain engineering trade-offs. We discuss these trade-offs and outline key learning derived from our experience.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.