1989
DOI: 10.1177/109434208900300204
|View full text |Cite
|
Sign up to set email alerts
|

Level 3 Blas in Lu Factorization On the Cray-2, Eta-10P, and Ibm 3090-200/Vf

Abstract: We study various implementations of block Gaussian elimination on full matrices and examine their performance on three vector supercomputers, the CRAY-2, the ETA-10P, and the IBM 3090-200/VF. We show that the use of Level 3 BLAS kernels allows portability without sacrifice of efficiency and that good speeds can be obtained if tuned versions of the kernels are available. Indeed our results show that without using any assembler language outside the kernels we can approach the performance of assembler-coded routi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

1990
1990
1999
1999

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 20 publications
(12 citation statements)
references
References 13 publications
0
12
0
Order By: Relevance
“…The RISC BLAS • We considered the blocking of the triangular solver from the Level 3 BLAS-TRSM-in Daydé and Duff [1989]. Then, we developed a blocked version of the Level 3 BLAS for MIMD vector multiprocessors [Amestoy and Daydé 1993;Daydé et al 1994].…”
Section: Motivations and Design Of The Risc Blasmentioning
confidence: 99%
“…The RISC BLAS • We considered the blocking of the triangular solver from the Level 3 BLAS-TRSM-in Daydé and Duff [1989]. Then, we developed a blocked version of the Level 3 BLAS for MIMD vector multiprocessors [Amestoy and Daydé 1993;Daydé et al 1994].…”
Section: Motivations and Design Of The Risc Blasmentioning
confidence: 99%
“…First, let us summarize our conclusions from a previous report (Dayde and Duff 1989) on the implementation of block LU factorization on one processor of the CRAY-2, the ETAlO-P, and the IBM 3090 vector processors. KJI-SAXPY and JKI-GAXPY have similar performance using the Fortran model implementation or the tuned versions of Level 2 and Level 3 BLAS.…”
Section: Comparison Of the Block Factorization Variantsmentioning
confidence: 99%
“…The aim of this work is to show that, based on the use of Level 3 BLAS kernels, portable and efficient code can be designed for parallel vector computers with a global shared memory, extending discussions in Dayde and Duff (1989) This class of computer architecture is widely used in the design of today's supercomputers including the ALLIANT FX/80, the CRAY-2. and the IBM 3090.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In the blocked jik-SDOT [7], one block column of L and one block row of U are computed in each iteration. The basic steps involved in the jth iteration are shown in Figure 4 along with the data dependencies involved in each step.…”
Section: Parallel Blocked Jik-sdotmentioning
confidence: 99%