2010
DOI: 10.1016/j.parco.2009.12.005
|View full text |Cite
|
Sign up to set email alerts
|

Towards dense linear algebra for hybrid GPU accelerated manycore systems

Abstract: a b s t r a c tWe highlight the trends leading to the increased appeal of using hybrid multicore + GPU systems for high performance computing. We present a set of techniques that can be used to develop efficient dense linear algebra algorithms for these systems. We illustrate the main ideas with the development of a hybrid LU factorization algorithm where we split the computation over a multicore and a graphics processor, and use particular techniques to reduce the amount of pivoting and communication between … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
220
0
2

Year Published

2010
2010
2016
2016

Publication Types

Select...
7
2
1

Relationship

2
8

Authors

Journals

citations
Cited by 353 publications
(223 citation statements)
references
References 24 publications
1
220
0
2
Order By: Relevance
“…For the largest systems, the invocation of dSCF exposes linear algebra steps (Fock matrix diagonalization and the DIIS orbital gradient) as rate-limiting steps. We note in passing that the J/K steps parallelize to multiple GPUs more efficiently than the linear algebra steps (even when using the MAGMA library 47 for matrix diagonalization), meaning that the linear algebra steps are rate-limiting when using dSCF with 4-8 GPUs.…”
Section: The Difference Energy Ismentioning
confidence: 99%
“…For the largest systems, the invocation of dSCF exposes linear algebra steps (Fock matrix diagonalization and the DIIS orbital gradient) as rate-limiting steps. We note in passing that the J/K steps parallelize to multiple GPUs more efficiently than the linear algebra steps (even when using the MAGMA library 47 for matrix diagonalization), meaning that the linear algebra steps are rate-limiting when using dSCF with 4-8 GPUs.…”
Section: The Difference Energy Ismentioning
confidence: 99%
“…While performance models permit to generate efficient computational kernels even on heterogeneous systems, computations are usually mapped statically on the different processing resources when dealing with hybrid systems [11].…”
Section: Related Workmentioning
confidence: 99%
“…This issue arises in both unsymmetric and symmetric cases, and for both dense and sparse factorizations. The ScaLAPACK [7], MAGMA [16] and PLASMA [14] dense linear algebra libraries contain a Cholesky factorization for positive definite matrices, for which no pivoting is required, but they do not contain an LDL T factorization. They contain an LU factorization with partial pivoting (i.e.…”
Section: Introductionmentioning
confidence: 99%