2004
DOI: 10.1016/j.parco.2003.11.002
|View full text |Cite
|
Sign up to set email alerts
|

Architecture of an automatically tuned linear algebra library

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
22
0
2

Year Published

2005
2005
2013
2013

Publication Types

Select...
5
1
1

Relationship

3
4

Authors

Journals

citations
Cited by 30 publications
(24 citation statements)
references
References 11 publications
0
22
0
2
Order By: Relevance
“…In this paper, we have presented an extension of our previous self-optimization methodology for homogeneous systems [14], where several decisions (algorithmic parameters) were taken automatically in order to obtain execution times close to the optimum with parallel linear algebra routines. This technique has been combined with our approach of work distribution for parallel dynamic programming [8].…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…In this paper, we have presented an extension of our previous self-optimization methodology for homogeneous systems [14], where several decisions (algorithmic parameters) were taken automatically in order to obtain execution times close to the optimum with parallel linear algebra routines. This technique has been combined with our approach of work distribution for parallel dynamic programming [8].…”
Section: Resultsmentioning
confidence: 99%
“…In [14] we presented a self-optimization methodology for these routines on homogeneous systems (it was a HoHo strategy), where several decisions (algorithmic parameters) were taken automatically in order to obtain execution times close to the optimum, and without rewriting any code line of the routines. Some of these decisions are: number of processes to generate, which processors are used (if the system workload is heterogeneous, it could be interesting not to use the overloaded processors [15]), the logical topology of processes (normally, in parallel linear algebra routines the processes are organized in a logical 2D mesh to distribute the data and to perform the inter-processor communications) and the block size for both data distribution and computation.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The goal is to select the most appropriate number of threads at each level of parallelism, together with the values of other algorithmic parameters, like the block size (or sizes) in algorithms by blocks. The methodology of [9] is adapted to the empirical installation in NUMA systems. It is divided in three phases, which are represented in figure 1: -Design phase.…”
Section: The Auto-tuning Methodologymentioning
confidence: 99%
“…The libraries are optimized in the installation process for shared memory machines [29] or for message-passing systems [8,9]. The routines can adapt to the conditions of the system at a particular moment [23].…”
Section: Introductionmentioning
confidence: 99%