An overview of block Gram-Schmidt methods and their stability properties

Carson, Erin; Lund, Kathryn; Rozložńık, Miroslav; Thomas, Stephen

doi:10.48550/arxiv.2010.12058

Cited by 2 publications

(4 citation statements)

References 42 publications

(101 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our numerical results have demonstrated that DCGS2 obtains the same loss of orthogonality and representation error as CGS2, while our strong-scaling results on the Summit supercomputer indicate that DCGS2 obtains a speedup of 2× faster compute time on a single GPU, and an even larger speedup on an increasing number of GPUs, reaching 2.2× lower execution times on 192 GPUs. The impact of DCGS2 on the strong scaling of Krylov linear system solvers is currently being explored, and a block variant is also being implemented following the review article of Carson et al [15]. The software employed for this paper is available on GitHub.…”

Section: Discussionmentioning

confidence: 99%

“…This is achieved by lagging the normalization as originally proposed by Kim and Chronopoulos [14]) and then applying Stephen's trick. The Pythagorean trick introduced by Smoktunowicz et al [8] avoids cancellation errors and Carson et al [15] generalize this to block Gram-Schmidt algorithms. The delayed normalization for the Arnoldi iteration was employed by Hernandez et al [1] without a correction.…”

Section: Low-synch Gram-schmidt Algorithmsmentioning

confidence: 99%

See 1 more Smart Citation

Low-Synch Gram-Schmidt with Delayed Reorthogonalization for Krylov Solvers

Bielich¹,

Langou²,

Thomas³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

The parallel strong-scaling of Krylov iterative methods is largely determined by the number of global reductions required at each iteration. The GMRES and Krylov-Schur algorithms compute the Arnoldi expansion for nonsymmetric matrices. The underlying algorithm is "left-looking" and processes one column at a time. Thus, at least one global reduction is required per iteration. The usual method for generating the orthogonal Krylov basis for the Krylov-Schur algorithm is classical Gram Schmidt applied twice (CGS2), requiring three global reductions per iteration. A new variant of CGS2 that requires only one reduction per iteration is applied to the Arnoldi-QR iteration. Strong-scaling results are presented for finding eigenvalue-pairs of nonsymmetric matrices. A preliminary attempt to derive a similar parallel method (one reduction per Arnoldi iteration with a robust orthogonalization scheme) was presented by Hernandez et al. [1]. Unlike our approach, their method is not forward stable for eigenvalues.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Low-synch Gram-schmidt Algorithmsmentioning

confidence: 99%

Low-Synch Gram-Schmidt with Delayed Reorthogonalization for Krylov Solvers

Bielich¹,

Langou²,

Thomas³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…In the first case, the condition number of Q can grow as cond(W) 2 or even worse and thus requires special treatment [8]. While in the second case, cond(Q) can grow as cond(W) max 1≤j≤p cond(W (j) ) unless Step 2 is unconditionally stable [6,9]. The stability of these processes can be improved by re-orthogonalization, i.e., by running the inner loop twice.…”

Section: Algorithm 1 Block Gram-schmidt Processmentioning

confidence: 99%

“…It is also used in s-step, enlarged and other communication-avoiding Krylov subspace methods [12,14]. Please see [9] and the references therein for an extensive overview of BGS variants, and [2,17,18,22] for the underlying block Krylov methods.…”

Section: Introductionmentioning

confidence: 99%

Randomized block Gram-Schmidt process for solution of linear systems and eigenvalue problems

Balabanov¹,

Grigori²

2021

Preprint

View full text Add to dashboard Cite

We propose a block version of the randomized Gram-Schmidt process for computing a QR factorization of a matrix. Our algorithm inherits the major properties of its single-vector analogue from [Balabanov and Grigori, 2020] such as higher efficiency than the classical Gram-Schmidt algorithm and stability of the modified Gram-Schmidt algorithm, which can be refined even further by using multi-precision arithmetic. As in [Balabanov and Grigori, 2020], our algorithm has an advantage of performing standard high-dimensional operations, that define the overall computational cost, with a unit roundoff independent of the dominant dimension of the matrix. This unique feature makes the methodology especially useful for large-scale problems computed on low-precision arithmetic architectures.Block algorithms are advantageous in terms of performance as they are mainly based on cache-friendly matrix-wise operations, and can reduce communication cost in high-performance computing. The block Gram-Schmidt orthogonalization is the key element in the block Arnoldi procedure for the construction of Krylov basis, which in its turn is used in GMRES and Rayleigh-Ritz methods for the solution of linear systems and clustered eigenvalue problems. In this article, we develop randomized versions of these methods, based on the proposed randomized Gram-Schmidt algorithm, and validate them on nontrivial numerical examples.

show abstract

An overview of block Gram-Schmidt methods and their stability properties

Cited by 2 publications

References 42 publications

Low-Synch Gram-Schmidt with Delayed Reorthogonalization for Krylov Solvers

Low-Synch Gram-Schmidt with Delayed Reorthogonalization for Krylov Solvers

Randomized block Gram-Schmidt process for solution of linear systems and eigenvalue problems

Contact Info

Product

Resources

About