Towards dense linear algebra for hybrid GPU accelerated manycore systems

Tomov, Stanimire; Dongarra, Jack; Baboulin, Marc

doi:10.1016/j.parco.2009.12.005

Cited by 353 publications

(223 citation statements)

References 24 publications

Supporting

Mentioning

220

Contrasting

Unclassified

Order By: Relevance

“…For the largest systems, the invocation of dSCF exposes linear algebra steps (Fock matrix diagonalization and the DIIS orbital gradient) as rate-limiting steps. We note in passing that the J/K steps parallelize to multiple GPUs more efficiently than the linear algebra steps (even when using the MAGMA library 47 for matrix diagonalization), meaning that the linear algebra steps are rate-limiting when using dSCF with 4-8 GPUs.…”

Section: The Difference Energy Ismentioning

confidence: 99%

Communication: A difference density picture for the self-consistent field ansatz

Parrish

Liu

Martı́nez

2016

The Journal of Chemical Physics

View full text Add to dashboard Cite

We formulate self-consistent field (SCF) theory in terms of an interaction picture where the working variable is the difference density matrix between the true system and a corresponding superposition of atomic densities. As the difference density matrix directly represents the electronic deformations inherent in chemical bonding, this “difference self-consistent field (dSCF)” picture provides a number of significant conceptual and computational advantages. We show that this allows for a stable and efficient dSCF iterative procedure with wholly single-precision Coulomb and exchange matrix builds. We also show that the dSCF iterative procedure can be performed with aggressive screening of the pair space. These approximations are tested and found to be accurate for systems with up to 1860 atoms and >10 000 basis functions, providing for immediate overall speedups of up to 70% in the heavily optimized TeraChem SCF implementation.

show abstract

Section: The Difference Energy Ismentioning

confidence: 99%

Communication: A difference density picture for the self-consistent field ansatz

Parrish

Liu

Martı́nez

2016

The Journal of Chemical Physics

View full text Add to dashboard Cite

show abstract

“…While performance models permit to generate efficient computational kernels even on heterogeneous systems, computations are usually mapped statically on the different processing resources when dealing with hybrid systems [11].…”

Section: Related Workmentioning

confidence: 99%

Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures

Augonnet

Thibault

Namyst

2010

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Multicore architectures featuring specialized accelerators are getting an increasing amount of attention, and this success will probably influence the design of future High Performance Computing hardware. Unfortunately, programmers are actually having a hard time trying to exploit all these heterogeneous computing units efficiently, and most existing efforts simply focus on providing tools to offload some computations on available accelerators. Recently, some runtime systems have been designed that exploit the idea of scheduling -as opposed to offloading -parallel tasks over the whole set of heterogeneous computing units. Scheduling tasks over heterogeneous platforms makes it necessary to use accurate prediction models in order to assign each task to its most adequate computing unit [2]. A deep knowledge of the application is usually required to model per-task performance models, based on the algorithmic complexity of the underlying numeric kernel. We present an alternate, auto-tuning performance prediction approach based on performance history tables dynamically built during the application run. This approach does not require that the programmer provides some specific information. We show that, thanks to the use of a carefully chosen hash-function, our approach quickly achieves accurate performance estimations automatically. Our approach even outperforms regular algorithmic performance models with several linear algebra numerical kernels.

show abstract

“…This issue arises in both unsymmetric and symmetric cases, and for both dense and sparse factorizations. The ScaLAPACK [7], MAGMA [16] and PLASMA [14] dense linear algebra libraries contain a Cholesky factorization for positive definite matrices, for which no pivoting is required, but they do not contain an LDL T factorization. They contain an LU factorization with partial pivoting (i.e.…”

Section: Introductionmentioning

confidence: 99%

Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods

Baboulin¹,

Li²,

Rouet³

2015

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Abstract. We consider the solution of sparse linear systems using direct methods via LU factorization. Unless the matrix is positive definite, numerical pivoting is usually needed to ensure stability, which is costly to implement especially in the sparse case. The Random Butterfly Transformations (RBT) technique provides an alternative to pivoting and is easily parallelizable. The RBT transforms the original matrix into another one that can be factorized without pivoting with probability one. This approach has been successful for dense matrices; in this work, we investigate the sparse case. In particular, we address the issue of fill-in in the transformed system.

show abstract

Towards dense linear algebra for hybrid GPU accelerated manycore systems

Cited by 353 publications

References 24 publications

Communication: A difference density picture for the self-consistent field ansatz

Communication: A difference density picture for the self-consistent field ansatz

Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures

Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods

Contact Info

Product

Resources

About