Nawaaz Ahmed scite author profile

We present a simple and novel framework for generating blocked codes for high-performance machines with a memory hierarchy. Unlike traditional compiler techniques like tiling, which are based on reasoning about the control flow of programs, our techniques are based on reasoning directly about the flow of data through the memory hierarchy. Our data-centric transformations permit a more direct solution to the problem of enhancing data locality than current control-centric techniques do, and generalize easily to multiple levels of memory hierarchy. We buttress these claims with performance numbers for standard benchmarks from the problem domain of dense numerical linear algebra. The simplicity and intuitive appeal of our approach should make it attractive to compiler writers as well as to library writers. do J = l..N do K = l..I C[I,J] = C[I,Jl + A[I,KI * BIK,Jl (i) Matriz Multiplication do J = l..N Sl: A[J,Jl = sqrt (ACJ,JI) do I = J+l..N S2: A[I,Jl = AlI,Jl /

show abstract

Synthesizing transformations for locality enhancement of imperfectly-nested loop nests

Ahmed

Mateev

Pingali

2000

View full text Add to dashboard Cite

We present an approach for synthesizing transformations to enhance locality in imperfectly-nested loops. The key idea is to embed the iteration space of every statement in a loop nest into a special iteration space called the product space. The product space can be viewed as a perfectly-nested loop nest, so embedding generalizes techniques like code sinking and loop fusion that are used in ad hoc ways in current compilers to produce perfectly-nested loops from imperfectly-nested ones. In contrast to these ad hoc techniques however, our embeddings are chosen carefully to enhance locality. The product space is then transformed further to enhance locality, after which fully permutable loops are tiled, and code is generated. We evaluate the effectiveness of this approach for dense numerical linear algebra benchmarks, relaxation codes, and the tomcatv code from the SPEC benchmarks. BACKGROUND AND PREVIOUS WORKSophisticated algorithms based on polyhedral algebra have been developed for determining good sequences of linear loop transformations (permutation, skewing, reversal and scaling) for enhancing locality in perfectly-nested loops 1 . Highlights of this technology are the following. The iterations of the loop nest are modeled as points in an integer lattice, and linear loop transformations are modeled as nonsingular matrices mapping one lattice to another. A sequence of loop transformations is modeled by the product of matrices representing the individual transformations; since the set of nonsingular matrices is closed under matrix product, this means that a sequence of linear loop transformations can be represented by a nonsingular matrix. The problem of finding an optimal sequence of linear loop transformations is thus reduced to the problem of finding an integer matrix that satisfies some desired property, permitting the full machinery of matrix methods and lattice theory to ¢ This work was supported by NSF grants CCR-9720211, EIA-9726388, ACI-9870687,EIA-9972853. £A perfectly-nested loop is a set of loops in which all assignment statements are contained in the innermost loop.for t = 1,T for i1 = 2,N-1 for j1 = 2,N-1 S1:L(i1,j1) = (A(i1,j1+1) + A(i1,j1-1) + A(i1+1,j1) + A(i1-1,j1)) / 4 end end for i2 = 2,N-1 for j2 = 2,N-1 S2:A(i2,j2) = L(i2,j2) end end end This technology is fairly mature, and it has been incorporated into production compilers, enabling these compilers to produce good code for perfectly-nested loop nests. In most programs however, most loop nests are imperfectly-nested because one or more assignment statements are contained in some but not all of the loops of the loop nest. For example, important matrix factorizations like Cholesky, LU and QR factorizations [9] are all imperfectly-nested loop nests. An entire procedure, which usually is a sequence of perfectly-or imperfectly-nested loop nests, can itself be considered to be imperfectly-nested loop nest. As an example, consider the Jacobi code fragment in Figure 1 which is typical of programs that solve partial differential equations (p...

show abstract

Tiling Imperfectly-nested Loop Nests

Ahmed

Mateev²,

Pingali³

2000

View full text Add to dashboard Cite

Tiling is one of the more important transformations for enhancing locality of reference in programs. Intuitively, tiling a set of loops achieves the effect of interleaving iterations of these loops. Tiling of perfectly-nested loop nests (which are loop nests in which all assignment statements are contained in the innermost loop) is well understood. In practice, many loop nests are imperfectly-nested, so existing compilers use heuristics to try to find a sequence of transformations that convert such loop nests into perfectly-nested ones, but these heuristics do not always succeed. In this paper, we propose a novel approach to tiling imperfectly-nested loop nests. The key idea is to embed the iteration space of every statement in the imperfectly-nested loop nest into a special space called the product space which is tiled to produce the final code. We evaluate the effectiveness of this approach for dense numerical linear algebra benchmarks, relaxation codes, and the tomcatv code from the SPEC benchmarks. No other single approach in the literature can tile all these codes automatically.

show abstract

Automatic Generation of Block-Recursive Codes

Ahmed

Pingali

2000

View full text Add to dashboard Cite

Block-recursive codes for dense numerical linear algebra computations appear to be well-suited for execution on machines with deep memory hierarchies because they are effectively blocked for all levels of the hierarchy. In this paper, we describe compiler technology to translate iterative versions of a number of numerical kernels into block-recursive form. We also study the cache behavior and performance of these compiler generated block-recursive codes.

show abstract

A Framework for Sparse Matrix Code Synthesis from High-level Specifications

Ahmed¹,

Mateev²,

Pingali³

et al. 2000

View full text Add to dashboard Cite

We present compiler technology for synthesizing sparse matrix code from (i) dense matrix code, and (ii) a description of the index structure of a sparse matrix. Our approach is to embed statement instances into a Cartesian product of statement iteration and data spaces, and to produce efficient sparse code by identifying common enumerations for multiple references to sparse matrices. The approach works for imperfectly-nested codes with dependences, and produces sparse code competitive with hand-written library code for the Basic Linear Algebra Subroutines (BLAS).

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Nawaaz Ahmed

Data-centric multi-level blocking

Synthesizing transformations for locality enhancement of imperfectly-nested loop nests

Tiling Imperfectly-nested Loop Nests

Automatic Generation of Block-Recursive Codes

A Framework for Sparse Matrix Code Synthesis from High-level Specifications

Contact Info

Product

Resources

About