We describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. The use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. We also discuss implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.
The Bethe-Salpeter eigenvalue problem is a dense structured eigenvalue problem arising from discretized Bethe-Salpeter equation in the context of computing exciton energies and states. A computational challenge is that at least half of the eigenvalues and the associated eigenvectors are desired in practice. We establish the equivalence between Bethe-Salpeter eigenvalue problems and real Hamiltonian eigenvalue problems. Based on theoretical analysis, structure preserving algorithms for a class of Bethe-Salpeter eigenvalue problems are proposed. We also show that for this class of problems all eigenvalues obtained from the Tamm-Dancoff approximation are overestimated. In order to solve large scale problems of practical interest, we discuss parallel implementations of our algorithms targeting distributed memory systems.Several numerical examples are presented to demonstrate the efficiency and accuracy of our algorithms.
We present a new supernode-based incomplete LU factorization method to construct a preconditioner for solving sparse linear systems with iterative methods. The new algorithm is primarily based on the ILUTP approach by Saad, and we incorporate a number of techniques to improve the robustness and performance of the traditional ILUTP method. These include the new dropping strategies that accommodate the use of supernodal structures in the factored matrix. We present numerical experiments to demonstrate that our new method is competitive with the other ILU approaches and is well suited for today's high performance architectures.
In this paper, we propose a novel preconditioned solver for generalized Hermitian eigenvalue problems. More specifically, we address the case of a definite matrix pencil A − λB, that is, A, B are Hermitian and there is a shift λ 0 such that A−λ 0 B is definite. Our new method can be seen as a variant of the popular LOBPCG method operating in an indefinite inner product. It also turns out to be a generalization of the recently proposed LOBP4DCG method by Bai and Li for solving product eigenvalue problems. Several numerical experiments demonstrate the effectiveness of our method for addressing certain product and quadratic eigenvalue problems.
We present a special symmetric Lanczos algorithm and a kernel polynomial method (KPM) for approximating the absorption spectrum of molecules within the linear response time-dependent density functional theory (TDDFT) framework in the product form. In contrast to existing algorithms, the new algorithms are based on reformulating the original non-Hermitian eigenvalue problem as a product eigenvalue problem and the observation that the product eigenvalue problem is self-adjoint with respect to 1 an appropriately chosen inner product. This allows a simple symmetric Lanczos algorithm to be used to compute the desired absorption spectrum. The use of a symmetric Lanczos algorithm only requires half of the memory compared with the non-symmetric variant of the Lanczos algorithm. The symmetric Lanczos algorithm is also numerically more stable than the non-symmetric version. The KPM algorithm is also presented as a low-memory alternative to the Lanczos approach, but may require more matrix-vector multiplications in practice.We discuss the pros and cons of these methods in terms of their accuracy as well as their computational and storage cost. Applications to a set of small and medium-sized molecules are also presented.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.