Abstract. Obtaining the eigenvalues and eigenvectors of large matrices is a key problem in electronic structure theory and many other areas of computational science. The computational effort formally scales as O(N 3 ) with the size of the investigated problem, N (e.g., the electron count in electronic structure theory), and thus often defines the system size limit that practical calculations cannot overcome. In many cases, more than just a small fraction of the possible eigenvalue/eigenvector pairs is needed, so that iterative solution strategies that focus only on few eigenvalues become ineffective. Likewise, it is not always desirable or practical to circumvent the eigenvalue solution entirely. We here review some current developments regarding dense eigenvalue solvers and then focus on the ELPA library, which facilitates the efficient algebraic solution of symmetric and Hermitian eigenvalue problems for dense matrices that have real-valued and complex-valued matrix entries, respectively, on parallel computer platforms. ELPA addresses standard as well as generalized eigenvalue problems, relying on the well documented matrix layout of the ScaLAPACK library but replacing all actual parallel solution steps with subroutines of its own. For these steps, ELPA significantly outperforms the corresponding ScaLAPACK routines and proprietary libraries that implement the ScaLAPACK interface (e.g., Intel's MKL). The most time-critical step is the reduction of the matrix to tridiagonal form and the corresponding backtransformation of the eigenvectors. ELPA offers both a one-step tridiagonalization (successive Householder transformations) and a two-step transformation that is more efficient especially towards larger matrices and larger numbers of CPU cores. ELPA is based on the MPI standard, with an early hybrid MPIOpenMPI implementation available as well. Scalability beyond 10,000 CPU cores for problem sizes arising in the electronic structure theory is demonstrated for current high-performance computer architectures such as Cray or Intel/Infiniband. For a ELPA -Eigenvalue Solutions for Electronic Structure Theory 2 matrix of dimension 260,000, scalability up to 295,000 CPU cores has been shown on BlueGene/P.
a b s t r a c tThe computation of selected eigenvalues and eigenvectors of a symmetric (Hermitian) matrix is an important subtask in many contexts, for example in electronic structure calculations. If a significant portion of the eigensystem is required then typically direct eigensolvers are used. The central three steps are: reduce the matrix to tridiagonal form, compute the eigenpairs of the tridiagonal matrix, and transform the eigenvectors back. To better utilize memory hierarchies, the reduction may be effected in two stages: full to banded, and banded to tridiagonal. Then the back transformation of the eigenvectors also involves two stages. For large problems, the eigensystem calculations can be the computational bottleneck, in particular with large numbers of processors. In this paper we discuss variants of the tridiagonal-to-banded back transformation, improving the parallel efficiency for large numbers of processors as well as the per-processor utilization. We also modify the divide-and-conquer algorithm for symmetric tridiagonal matrices such that it can compute a subset of the eigenpairs at reduced cost. The effectiveness of our modifications is demonstrated with numerical experiments.
We develop an algorithmic framework for reducing the bandwidth of symmetric matrices via orthogonal similarity transformations. This framework includes the reduction of full matrices to banded or tridiagonal form and the reduction of banded matrices to narrower banded or tridiagonal form, possibly in multiple steps. Our framework leads to algorithms that require fewer floating-point operations than do standard algorithms, if only the eigenvalues are required. In addition, it allows for space-time tradeoffs and enables or increases the use of blocked transformations.
We study Chebyshev filter diagonalization as a tool for the computation of many interior eigenvalues of very large sparse symmetric matrices. In this technique the subspace projection onto the target space of wanted eigenvectors is approximated with filter polynomials obtained from Chebyshev expansions of window functions. After the discussion of the conceptual foundations of Chebyshev filter diagonalization we analyze the impact of the choice of the damping kernel, search space size, and filter polynomial degree on the computational accuracy and effort, before we describe the necessary steps towards a parallel high-performance implementation. Because Chebyshev filter diagonalization avoids the need for matrix inversion it can deal with matrices and problem sizes that are presently not accessible with rational function methods based on direct or iterative linear solvers. To demonstrate the potential of Chebyshev filter diagonalization for large-scale problems of this kind we include as an example the computation of the 10 2 innermost eigenpairs of a topological insulator matrix with dimension 10 9 derived from quantum physics applications.
We analyze the FEAST method for computing selected eigenvalues and eigenvectors of large sparse matrix pencils. After establishing the close connection between FEAST and the well-known Rayleigh-Ritz method, we identify several critical issues that influence convergence and accuracy of the solver: the choice of the starting vector space, the stopping criterion, how the inner linear systems impact the quality of the solution, and the use of FEAST for computing eigenpairs from multiple intervals. We complement the study with numerical examples, and hint at possible improvements to overcome the existing problems.
Spherical t-designs provide quadrature rules for the sphere which are exact for polynomials up to degree t. In this paper, we propose a computational algorithm based on interval arithmetic which, for given t, upon successful completion will have proved the existence of a t-design with (t + 1) 2 nodes on the unit sphere S 2 ⊂ R 3 and will have computed narrow interval enclosures which are known to contain these nodes with mathematical certainty. Since there is no theoretical result which proves the existence of a t-design with (t + 1) 2 nodes for arbitrary t, our method contributes to the theory because it was tested successfully for t = 1, 2, . . . , 100. The t-design is usually not unique; our method aims at finding a well-conditioned one. The method relies on computing an interval enclosure for the zero of a highly nonlinear system of dimension (t + 1) 2 . We therefore develop several special approaches which allow us to use interval arithmetic efficiently in this particular situation. The computations were all done using the MATLAB toolbox INTLAB.
Mathematics Subject Classification (2000) 65H10 · 65G20Dedicated to Götz Alefeld on the occasion of his 70th birthday.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.