David W. Walker scite author profile

This paper describes ScaLAPACK, a distributed memory version of the L A P A C K software package for dense and banded matrix computations. K e y design features are the use of distributed versions of the Level 3 B L A S as building blocks, and an object-based interface t o the library routines. The square block scattered decomposition is described. The implementation of a distributed memory version of the right-looking LU factorization algorithm on the Intel Delta multicomputer is discussed, and performance results are presented that demonstrate the scalability of the algorithm.

show abstract

Solving Problems On Concurrent Processors Vol. 1: General Techniques and Regular Problems

Fox

et al. 1989

View full text Add to dashboard Cite

The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers

Berry

Chen

Koss

et al. 1989

The International Journal of Supercomputing Applications

315

121

View full text Add to dashboard Cite

This report presents a methodology for measuring the performance of supercomputers. It includes 13 Fortran programs that total over 50,000 lines of source code. They represent applications in several areas of engi neering and scientific computing, and in many cases the codes are currently being used by computational re search and development groups. We also present the PERFECT Fortran standard, a set of guidelines that allow portability to several types of machines. Furthermore, we present some performance measures and a method ology for recording and sharing results among diverse users on different machines. The results presented in this paper should not be used to compare machines, except in a preliminary sense. Rather, they are presented to show how the methodology has been applied, and to encourage others to join us in this effort. The results should be regarded as the first step toward our objec tive, which is to develop a publicly accessible data base of performance information of this type.

show abstract

Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines

Choi

Dongarra

Ostrouchov

et al. 1995

Scientific Programming

162

115

View full text Add to dashboard Cite

This article discusses the core factorization routines included in the ScaLAPACK library. These routines allow the factorization and solution of a dense system of linear equations via LU, QR, and Cholesky. They are implemented using a block cyclic data distribution, and are built using de facto standard kernels for matrix and vector operations (BLAS and its parallel counterpart PBLAS) and message passing communication (BLACS). In implementing the ScaLAPACK routines, a major objective was to parallelize the corresponding sequential LAPACK using the BLAS, BLACS, and PBLAS as building blocks, leading to straightforward parallel implementations without a significant loss in performance. We present the details of the implementation of the ScaLAPACK factorization routines, as well as performance and scalability results on the Intel iPSC/860, Intel Touchstone Delta, and Intel Paragon System.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

David W. Walker

Chebyshev tau-QZ algorithm methods for calculating spectra of hydrodynamic stability problems

ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers

Solving Problems On Concurrent Processors Vol. 1: General Techniques and Regular Problems

The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers

Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines

Contact Info

Product

Resources

About