A. Fernandez scite author profile

Loop tiling is a well-known loop transformation generally used to expose coarse-grain parallelism and to exploit data reuse at the cache level. Tiling can also be used to exploit data reuse at the register level and to improve a program's ILP. However, previous proposals in the literature (as well as commercial compilers) are only able to perform multidimensional tiling for the register level when the iteration space is rectangular. In this article we present a new general algorithm to perform multidimensional tiling for the register level in both rectangular and nonrectangular iteration spaces. We also propose a simple heuristic to determine the tiling parameters at this level. Finally, we evaluate our method using as benchmarks typical linear algebra algorithms having nonrectangular iteration spaces and compare our proposal against hand-optimized vendor-supplied numerical libraries and against commercial compilers able to perform optimizing code transformations such as inner unrolling, unroll-and-jam, and software pipelining. Measurements were taken on three different superscalar microprocessors. Results will show that our method outperforms the native compilers (showing speedups of 2.5 in average) and matches the performance of vendor-supplied numerical libraries. The general conclusion is that compiler technology can make it possible for nonrectangular loop nests to achieve as high performance as hand-optimized codes.

show abstract

Shader Performance Analysis on a Modern GPU Architecture

Moya

Gonzalez

Roca

et al.

View full text Add to dashboard Cite

Loop transformation using nonunimodular matrices

Fernandez

Llaberia

Valero-García

1995

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

In-flight reconfigurable FPGA-based space systems

Montealegre

Merodio

Fernandez

et al. 2015

View full text Add to dashboard Cite

Development of a multi-material additive manufacturing process for electronic devices

Blanco

Bonada

Fernandez

et al. 2017

Procedia Manufacturing

View full text Add to dashboard Cite

Performance evaluation of tiling for the register level

Jimenez

Llaberia²,

Fernandez³

View full text Add to dashboard Cite

Tiling is a well-known loop transformation, which is basically used to expose coarse-grain parallelism and to exploit data reuse at the cache level. However, it can also be used to exploit data reuse at the register level and to improve programs's ILP. Previous work on tiling and also commercial compilers are able to perform tiling for the register level in more than one dimension when the iteration space is rectangular. Non-rectangular iteration spaces are commonly found in linear algebra algorithms or can arise as a result of applying previous transformations such as loop skewing. In this paper we evaluate the technique we present in [11] which is able to perform tiling for the register level in more than one dimension in both rectangular and non-rectangular iteration spaces. We use typical linear algebra algorithms having non-rectangular iteration spaces as benchmarks and compare our proposal against commercial preprocessors able to perform optimizing code transformations such as inner unrolling, outer unrolling and software pipelining. We will also present quantitative data showing the benefits of tiling only for the register level, tiling only for the cache level and tiling for both levels simultaneously. Results measured on a ALPHA 21164 processor show that tiling for both cache and register levels improves upon commercial compilers and preprocessors by factors in the range of 1.3 to 6.3.

show abstract

Cellular Neural Networks simulation on a parallel graphics processing unit

Fernandez

Martín

Farguell

et al. 2008

View full text Add to dashboard Cite

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

A. Fernandez

ATTILA: a cycle-level execution-driven simulator for modern GPU architectures

Register tiling in nonrectangular iteration spaces

Shader Performance Analysis on a Modern GPU Architecture

Loop transformation using nonunimodular matrices

In-flight reconfigurable FPGA-based space systems

Development of a multi-material additive manufacturing process for electronic devices

Performance evaluation of tiling for the register level

Cellular Neural Networks simulation on a parallel graphics processing unit

Contact Info

Product

Resources

About