Architecture of an automatically tuned linear algebra library

Cuenca, Javier; Giménez, Domingo; González, José María Faci

doi:10.1016/j.parco.2003.11.002

Cited by 30 publications

(24 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In this paper, we have presented an extension of our previous self-optimization methodology for homogeneous systems [14], where several decisions (algorithmic parameters) were taken automatically in order to obtain execution times close to the optimum with parallel linear algebra routines. This technique has been combined with our approach of work distribution for parallel dynamic programming [8].…”

Section: Resultsmentioning

confidence: 99%

“…In [14] we presented a self-optimization methodology for these routines on homogeneous systems (it was a HoHo strategy), where several decisions (algorithmic parameters) were taken automatically in order to obtain execution times close to the optimum, and without rewriting any code line of the routines. Some of these decisions are: number of processes to generate, which processors are used (if the system workload is heterogeneous, it could be interesting not to use the overloaded processors [15]), the logical topology of processes (normally, in parallel linear algebra routines the processes are organized in a logical 2D mesh to distribute the data and to perform the inter-processor communications) and the block size for both data distribution and computation.…”

Section: Introductionmentioning

confidence: 99%

“…The starting point of this work consists of the proposed model for homogeneous parallel system with P processors of similar physical characteristics, homogeneously connected [14,16]. This model is constituted by a set of parameters that characterize the system (system parameters, SP ), divided in two groups: arithmetic system parameters (SP ari ) and communication system parameters (SP com …”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Processes Distribution of Homogeneous Parallel Linear Algebra Routines on Heterogeneous Clusters

Cuenca

García

Giménez

et al. 2005

2005 IEEE International Conference on Cluster Computing

Self Cite

View full text Add to dashboard Cite

This paper presents a self-optimization methodology for parallel linear algebra routines on heterogeneous systems. For each routine, a series of decisions is taken automatically in order to obtain an execution time close to the optimum (without rewriting the routine's code). Some of these decisions are: the number of processes to generate, the heterogeneous distribution of these processes over the network of processors, the logical topology of the generated processes, ... To reduce the searching space of such decisions, different heuristics have been used. The experiments have been performed with a parallel LU factorization routine similar to the ScaLAPACK one, and good results have been obtained on different heterogeneous platforms.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Processes Distribution of Homogeneous Parallel Linear Algebra Routines on Heterogeneous Clusters

Cuenca

García

Giménez

et al. 2005

2005 IEEE International Conference on Cluster Computing

Self Cite

View full text Add to dashboard Cite

show abstract

“…The goal is to select the most appropriate number of threads at each level of parallelism, together with the values of other algorithmic parameters, like the block size (or sizes) in algorithms by blocks. The methodology of [9] is adapted to the empirical installation in NUMA systems. It is divided in three phases, which are represented in figure 1: -Design phase.…”

Section: The Auto-tuning Methodologymentioning

confidence: 99%

“…The libraries are optimized in the installation process for shared memory machines [29] or for message-passing systems [8,9]. The routines can adapt to the conditions of the system at a particular moment [23].…”

Section: Introductionmentioning

confidence: 99%

Empirical Installation of Linear Algebra Shared-Memory Subroutines for Auto-Tuning

Cámara

Cuenca

Giménez

et al. 2013

Int J Parallel Prog

Self Cite

View full text Add to dashboard Cite

The introduction of auto-tuning techniques in linear algebra shared-memory routines is analyzed. Information obtained in the installation of the routines is used at running time to take some decisions to reduce the total execution time. The study is carried out with routines at different levels (matrix multiplication, LU and Cholesky factorizations and linear systems symmetric or general routines) and with calls to routines in the LAPACK and PLASMA libraries with multithread implementations. Medium NUMA and large cc-NUMA systems are used in the experiments. This variety of routines, libraries and systems allows us to obtain general conclusions about the methodology to use for linear algebra shared-memory routines auto-tuning. Satisfactory execution times are obtained with the proposed methodology.

show abstract

Performance Modeling and Optimal Block Size Selection for the Small-Bulge Multishift QR Algorithm

Yamamoto

2006

Parallel and Distributed Processing and Applications

View full text Add to dashboard Cite

Architecture of an automatically tuned linear algebra library

Cited by 30 publications

References 11 publications

Processes Distribution of Homogeneous Parallel Linear Algebra Routines on Heterogeneous Clusters

Processes Distribution of Homogeneous Parallel Linear Algebra Routines on Heterogeneous Clusters

Empirical Installation of Linear Algebra Shared-Memory Subroutines for Auto-Tuning

Performance Modeling and Optimal Block Size Selection for the Small-Bulge Multishift QR Algorithm

Contact Info

Product

Resources

About