Abstract. A novel variant of the parallel QR algorithm for solving dense nonsymmetric eigenvalue problems on hybrid distributed high performance computing (HPC) systems is presented. For this purpose, we introduce the concept of multi-window bulge chain chasing and parallelize aggressive early deflation. The multi-window approach ensures that most computations when chasing chains of bulges are performed in level 3 BLAS operations, while the aim of aggressive early deflation is to speed up the convergence of the QR algorithm. Mixed MPI-OpenMP coding techniques are utilized for porting the codes to distributed memory platforms with multithreaded nodes, such as multicore processors. Numerous numerical experiments confirm the superior performance of our parallel QR algorithm in comparison with the existing ScaLAPACK code, leading to an implementation that is one to two orders of magnitude faster for sufficiently large problems, including a number of examples from applications.Key words. Eigenvalue problem, nonsymmetric QR algorithm, multishift, bulge chasing, parallel computations, level 3 performance, aggressive early deflation, parallel algorithms, hybrid distributed memory systems.
AMS subject classifications. 65F15, 15A181. Introduction. Computing the eigenvalues of a matrix A ∈ R n×n is at the very heart of numerical linear algebra, with applications coming from a broad range of science and engineering. With the increased complexity of mathematical models and availability of HPC systems, there is a growing demand to solve large-scale eigenvalue problems.While iterative eigensolvers, such as Krylov subspace and Jacobi-Davidson methods [8], may quite successfully deal with large-scale sparse eigenvalue problems in most situations, classical factorization-based methods, such as the QR algorithm discussed in this paper, still play an important role. This is already evident from the fact that most iterative methods rely on the QR algorithm for solving (smaller) subproblems. In certain situations, factorization-based methods may be the preferred choice even for directly addressing a large-scale problem. For example, it might be difficult or impossible to guarantee that an iterative method returns all eigenvalues in a specified region of the complex plane. Even the slightest chance of having an eigenvalue missed may have perilous consequences, e.g., in a stability analysis. Moreover, by their nature, standard iterative eigensolvers are ineffective in situations where a large fraction of eigenvalues and eigenvectors needs to be computed, as in some algorithms for linear-quadratic optimal control [48] and density functional theory [50]. In contrast, factorization-based methods based on similarity transformations, such as the QR algorithm, compute all eigenvalues anyway and there is consequently no danger to miss an eigenvalue. We conclude that an urgent need for high performance parallel variants of factorization-based eigensolvers can be expected to persist in the future.Often motivated by applications in computational...