1991
DOI: 10.1177/109434209100500308
|View full text |Cite
|
Sign up to set email alerts
|

Use of Level 3 Blas in Lu Factorization in a Multiprocessing Environment On Three Vector Multiprocessors: the Alliant Fx/80, the Cray-2, and the Ibm 3090 Vf

Abstract: We study various implementations of block Gaussian elimination on full matrices and examine their perfor mance on three parallel computers, the Alliant FX/80, the CRAY-2, and the IBM 3090-400/VF. These imple mentations are expressed in terms of Level 3 BLAS matrix-matrix kernels. We consider the use of parallel Level 3 BLAS kernels and compare the parallelism ob tained within the computational kernels with that ob tained when parallelizing over the kernels. We show that the use of parallel Level 3 BLAS allows … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0
2

Year Published

1992
1992
1997
1997

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 14 publications
(9 citation statements)
references
References 10 publications
0
7
0
2
Order By: Relevance
“…This work is a logical continuation of previous studies on the implementation of Level 3 BLAS on a Transputer network (Berger, Dayd e, and Mor ere (1991)) and on the design of a blocked parallel version of Level 3 BLAS for various shared memory vector multiprocessors (see Dayd e and Du (1991), and Dayd e, Du, and Petitet (1992)).…”
Section: Level Blas On the Bbn Tc2000mentioning
confidence: 63%
See 3 more Smart Citations
“…This work is a logical continuation of previous studies on the implementation of Level 3 BLAS on a Transputer network (Berger, Dayd e, and Mor ere (1991)) and on the design of a blocked parallel version of Level 3 BLAS for various shared memory vector multiprocessors (see Dayd e and Du (1991), and Dayd e, Du, and Petitet (1992)).…”
Section: Level Blas On the Bbn Tc2000mentioning
confidence: 63%
“…In Table 7.1 we compare the speed-ups obtained on parallel matrix-matrix multiplication on the BBN TC2000 and other shared memory multiprocessors : the Alliant FX/80, the CRAY-2, and the IBM 3090 models E and J (see Dayd e and Du (1991)). The speed-ups obtained on the BBN TC2000 can be successfully compared to those obtained on the other computers even if the performance achieved is not always comparable.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…On Figures 10 and 9, the effective speed-up are drawn with solid line and the theoretical one with dashed lines. The theoretical speed-up is computed according to the Amdhal law (see [12]):…”
Section: Residuals and Stopping Criteriamentioning
confidence: 99%