A new efficient parallelization strategy for the QR algorithm

Schreiber, Thomas; Otto, P.; Hofmann, Fridolin

doi:10.1016/0167-8191(94)90112-0

Cited by 6 publications

(3 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…More recent attempts to solve the scalability problems were presented in [35,55,58,59,60,63], especially when focus turned to small-bulge multishift variants with (Cartesian) two-dimensional (2D) block cyclic data layout [36]. However, as will be discussed below, a remaining problem so far has been a seemingly non-tractable trade-off problem between local node speed and global scalability.…”

Section: Review Of Earlier Workmentioning

confidence: 99%

A Novel Parallel QR Algorithm for Hybrid Distributed Memory HPC Systems

Granat¹,

Kågström²,

Kreßner³

2010

SIAM J. Sci. Comput.

View full text Add to dashboard Cite

Abstract. A novel variant of the parallel QR algorithm for solving dense nonsymmetric eigenvalue problems on hybrid distributed high performance computing (HPC) systems is presented. For this purpose, we introduce the concept of multi-window bulge chain chasing and parallelize aggressive early deflation. The multi-window approach ensures that most computations when chasing chains of bulges are performed in level 3 BLAS operations, while the aim of aggressive early deflation is to speed up the convergence of the QR algorithm. Mixed MPI-OpenMP coding techniques are utilized for porting the codes to distributed memory platforms with multithreaded nodes, such as multicore processors. Numerous numerical experiments confirm the superior performance of our parallel QR algorithm in comparison with the existing ScaLAPACK code, leading to an implementation that is one to two orders of magnitude faster for sufficiently large problems, including a number of examples from applications.Key words. Eigenvalue problem, nonsymmetric QR algorithm, multishift, bulge chasing, parallel computations, level 3 performance, aggressive early deflation, parallel algorithms, hybrid distributed memory systems. AMS subject classifications. 65F15, 15A181. Introduction. Computing the eigenvalues of a matrix A ∈ R n×n is at the very heart of numerical linear algebra, with applications coming from a broad range of science and engineering. With the increased complexity of mathematical models and availability of HPC systems, there is a growing demand to solve large-scale eigenvalue problems.While iterative eigensolvers, such as Krylov subspace and Jacobi-Davidson methods [8], may quite successfully deal with large-scale sparse eigenvalue problems in most situations, classical factorization-based methods, such as the QR algorithm discussed in this paper, still play an important role. This is already evident from the fact that most iterative methods rely on the QR algorithm for solving (smaller) subproblems. In certain situations, factorization-based methods may be the preferred choice even for directly addressing a large-scale problem. For example, it might be difficult or impossible to guarantee that an iterative method returns all eigenvalues in a specified region of the complex plane. Even the slightest chance of having an eigenvalue missed may have perilous consequences, e.g., in a stability analysis. Moreover, by their nature, standard iterative eigensolvers are ineffective in situations where a large fraction of eigenvalues and eigenvectors needs to be computed, as in some algorithms for linear-quadratic optimal control [48] and density functional theory [50]. In contrast, factorization-based methods based on similarity transformations, such as the QR algorithm, compute all eigenvalues anyway and there is consequently no danger to miss an eigenvalue. We conclude that an urgent need for high performance parallel variants of factorization-based eigensolvers can be expected to persist in the future.Often motivated by applications in computational...

show abstract

Section: Review Of Earlier Workmentioning

confidence: 99%

A Novel Parallel QR Algorithm for Hybrid Distributed Memory HPC Systems

Granat¹,

Kågström²,

Kreßner³

2010

SIAM J. Sci. Comput.

View full text Add to dashboard Cite

show abstract

“…An exception is the successful high-performance pipelined Householder QZ algorithm in [18]. Although this paper is not directly concerned with distributed memory computation, it is worth noting that there are distributed memory implementations of the QR algorithm [31,45,48,50].…”

mentioning

confidence: 99%

The Multishift QR Algorithm. Part I: Maintaining Well-Focused Shifts and Level 3 Performance

Braman¹,

Byers²,

Mathias³

2002

SIAM J. Matrix Anal. & Appl.

View full text Add to dashboard Cite

This paper presents a small-bulge multishift variation of the multishift QR algorithm that avoids the phenomenon of shift blurring, which retards convergence and limits the number of simultaneous shifts. It replaces the large diagonal bulge in the multishift QR sweep with a chain of many small bulges. The small-bulge multishift QR sweep admits nearly any number of simultaneous shifts-even hundreds-without adverse effects on the convergence rate. With enough simultaneous shifts, the small-bulge multishift QR algorithm takes advantage of the level 3 BLAS, which is a special advantage for computers with advanced architectures.

show abstract

“…Con respecto a las implementaciones paralelas para el método iterativo QR, ha habido muchas versiones [VdG88,vdGH89,vdG93,SOH94,Wat94,HVDG96], aunque la primera versión altamente difundida fue la presente en la versión 1.5 de ScaLAPACK [BCC + 97], en la forma de su rutina PDLAHQR, basada en el trabajo de Henry, Watkins y Dongarra [HWD02]. Desde entonces, Granat y Kagstrom están realizando múltiples nuevas versiones [GKK10, GKKS15] con elánimo de sustituir esta rutina por versiones donde se explote mejor la simultaneidad de barrigas con desplazamientos múltiples y las técnicas agresivas de deflación tempranas.…”

Section: Algoritmo 32 Transformación a Forma De Hessenberg Mediante unclassified

Algoritmos paralelos para la reducción de sistemas lineales de control estables

López¹

View full text Add to dashboard Cite

A new efficient parallelization strategy for the QR algorithm

Cited by 6 publications

References 3 publications

A Novel Parallel QR Algorithm for Hybrid Distributed Memory HPC Systems

A Novel Parallel QR Algorithm for Hybrid Distributed Memory HPC Systems

The Multishift QR Algorithm. Part I: Maintaining Well-Focused Shifts and Level 3 Performance

Algoritmos paralelos para la reducción de sistemas lineales de control estables

Contact Info

Product

Resources

About