A Parallel Divide and Conquer Algorithm for the Symmetric Eigenvalue Problem on Distributed Memory Architectures

Tisseur, Françoise; Dongarra, Jack

doi:10.1137/s1064827598336951

Cited by 77 publications

(59 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The current release 1.8.0 of the ScaLAPACK library [33] provides three different methods for computing eigenvalues and eigenvectors of a symmetric tridiagonal matrix: xSTEQR2, the implicit QL/QR method [47]; PxSTEBZ and PxSTEIN, a combination of bisection and inverse iteration (B&I) [48,49]; and PxSTEDC, the divide-and-conquer (D&C) method [50][51][52]. LAPACK 3.2.2 [31] and release 3.2 of the PLAPACK library [53] also provide the new MRRR algorithm [54,12], which will be included in a future ScaLAPACK release as well [55].…”

Section: Partial Eigensystems Of Symmetric Tridiagonal Matricesmentioning

confidence: 99%

Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations

et al. 2011

View full text Add to dashboard Cite

a b s t r a c tThe computation of selected eigenvalues and eigenvectors of a symmetric (Hermitian) matrix is an important subtask in many contexts, for example in electronic structure calculations. If a significant portion of the eigensystem is required then typically direct eigensolvers are used. The central three steps are: reduce the matrix to tridiagonal form, compute the eigenpairs of the tridiagonal matrix, and transform the eigenvectors back. To better utilize memory hierarchies, the reduction may be effected in two stages: full to banded, and banded to tridiagonal. Then the back transformation of the eigenvectors also involves two stages. For large problems, the eigensystem calculations can be the computational bottleneck, in particular with large numbers of processors. In this paper we discuss variants of the tridiagonal-to-banded back transformation, improving the parallel efficiency for large numbers of processors as well as the per-processor utilization. We also modify the divide-and-conquer algorithm for symmetric tridiagonal matrices such that it can compute a subset of the eigenpairs at reduced cost. The effectiveness of our modifications is demonstrated with numerical experiments.

show abstract

Section: Partial Eigensystems Of Symmetric Tridiagonal Matricesmentioning

confidence: 99%

Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations

et al. 2011

View full text Add to dashboard Cite

show abstract

“…Introduced by Cuppen [12], the D&C algorithm computes the eigenvalues of the tridiagonal matrix T . Many serial and parallel Cuppen-based eigensolver implementations for shared and distributed memory have been proposed [18,22,26,28,38,40,42]. The D&C approach can then be expressed in three phases: (a) the partition phase, (b) the solution of the simple eigenvalue problems, and (c) the merging phase.…”

Section: 3mentioning

confidence: 99%

“…The D&C approach is sequentially one of the fastest methods currently available if all eigenvalues and eigenvectors are to be computed [13]. It also has attractive parallelization properties as shown in [42]. Finally, it is noteworthy to mention the deflation process, which occurs during the computation of the low rank modifications.…”

Section: 3mentioning

confidence: 99%

Toward a High Performance Tile Divide and Conquer Algorithm for the Dense Symmetric Eigenvalue Problem

Haidar¹,

Ltaief²,

Dongarra³

2012

SIAM J. Sci. Comput.

Self Cite

View full text Add to dashboard Cite

Abstract. Classical solvers for the dense symmetric eigenvalue problem suffer from the first step, which involves a reduction to tridiagonal form that is dominated by the cost of accessing memory during the panel factorization. The solution is to reduce the matrix to a banded form, which then requires the eigenvalues of the banded matrix to be computed. The standard divide and conquer algorithm can be modified for this purpose. The paper combines this insight with tile algorithms that can be scheduled via a dynamic runtime system to multicore architectures. A detailed analysis of performance and accuracy is included. Performance improvements of 14-fold and 4-fold speedups are reported relative to LAPACK and Intel's Math Kernel Library.Key words. divide and conquer, symmetric eigenvalue solver, tile algorithms, dynamic scheduling AMS subject classifications. 15A18, 65F15, 65F18, 65Y05, 65Y20, 68W10 DOI. 10.1137/1108236991. Introduction. The objective of this paper is to introduce a new high performance tile divide and conquer (TD&C) eigenvalue solver for dense symmetric matrices on homogeneous multicore architectures. The necessity of calculating eigenvalues emerges from various computational science disciplines, e.g., in quantum physics [33], chemistry [37], and mechanics [25], as well as in statistics when computing the principal component analysis of the symmetric covariance matrix. As multicore systems continue to gain ground in the high performance computing community, linear algebra algorithms have to be redesigned or new algorithms have to be developed in order to take advantage of the architectural features brought by these processing units.In particular, tile algorithms have recently shown very promising performance results for solving linear systems of equations on multicore architectures using Cholesky, QR/LQ, and LU factorizations available in the PLASMA [34] library and other similar projects like FLAME [44]. The PLASMA concepts consist of splitting the input matrix into square tiles and reorganizing the data within each tile to be contiguous in memory (block data layout) for efficient cache reuse. The whole dataflow execution can then be represented as a directed acyclic graph (DAG) where nodes are tasks operating on tiles, and edges represent dependencies between them. An efficient and lightweight runtime system environment named QUARK [30] (internally

show abstract

“…Many serial and parallel Cuppenbased eigensolver implementations for shared and distributed memory have been proposed in the past [10,11,17,25,27,28]. The overall D&C approach consists in splitting the problem into two subproblems (son nodes) representing a rankone modification.…”

Section: Flexible Multi-gpu Divide and Conquer Algorithmmentioning

confidence: 99%

Leading Edge Hybrid Multi-GPU Algorithms for Generalized Eigenproblems in Electronic Structure Calculations

Haidar

Solcà

Gates

et al. 2013

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Abstract. Today's high computational demands from engineering fields and complex hardware development make it necessary to develop and optimize new algorithms toward achieving high performance and good scalability on the next generation of computers. The enormous gap between the high-performance capabilities of GPUs and the slow interconnect between them has made the development of numerical software that is scalable across multiple GPUs extremely challenging. We describe and analyze a successful methodology to address the challenges-starting from our algorithm design, kernel optimization and tuning, to our programming model-in the development of a scalable high-performance generalized eigenvalue solver in the context of electronic structure calculations in materials science applications. We developed a set of leading edge dense linear algebra algorithms, as part of a generalized eigensolver, featuring fine grained memory aware kernels, a task based approach and hybrid execution/scheduling. The goal of the new design is to increase the computational intensity of the major compute kernels and to reduce synchronization and data transfers between GPUs. We report the performance impact on the generalized eigensolver when different fractions of eigenvectors are needed. The algorithm described provides an enormous performance boost compared to current GPU-based solutions, and performance comparable to state-of-the-art distributed solutions, using a single node with multiple GPUs.

show abstract

A Parallel Divide and Conquer Algorithm for the Symmetric Eigenvalue Problem on Distributed Memory Architectures

Cited by 77 publications

References 21 publications

Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations

Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations

Toward a High Performance Tile Divide and Conquer Algorithm for the Dense Symmetric Eigenvalue Problem

Leading Edge Hybrid Multi-GPU Algorithms for Generalized Eigenproblems in Electronic Structure Calculations

Contact Info

Product

Resources

About