The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale

Dongarra, Jack; Gates, Mark; Haidar, Azzam; Kurzak, Jakub; Łuszczek, Piotr; Tomov, Stanimire; Yamazaki, Ichitaro

doi:10.1137/17m1117732

Cited by 81 publications

(41 citation statements)

References 100 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…From a computational point of view evaluating (5.1) is not expensive because B 1 is a diagonal matrix, and m n, that is Y is a tall, skinny matrix, and therefore σ m (Y ) can be computed very efficiently [17,Sec 5.4]. Similarly the expression for µ B in (3.20) can also be simplified, which gives…”

Section: Application To the Stochastic Galerkin Methodsmentioning

confidence: 99%

Backward error and condition number of a generalized Sylvester equation, with application to the stochastic Galerkin method

Pranesh

2020

Linear Algebra and its Applications

View full text Add to dashboard Cite

The governing equations of the stochastic Galerkin method can be formulated as a generalized Sylvester equation. Therefore developing solvers for it is attracting a lot of attention from the uncertainty quantification community. In this regard Krylov subspace based iterative solvers, which are used for standard linear systems are being used for the generalized Sylvester equations as well. This is achieved by converting the generalized Sylvester equation to a standard linear system using the Kronecker product. Accordingly the residual is used as a stopping criterion for the iterations, and the condition number of linear systems is used for the generalized Sylvester equations as well. For a linear system a small residual implies a small backward error, and hence using residual as a stopping criterion is justified. In this work we prove that this need not be the case for the generalized Sylvester equation. We introduce two definitions for the backward error, and then derive an upperbound on each of them. We also verify the predictions of the analysis using numerical experiments. For the special case of the stochastic Galerkin method we show that the upper bound on the backward error can be computed with minimal computational overhead, and hence it can be used as a stopping criterion in the iterative solvers. For the matrices from the stochastic Galerkin method we numerically demonstrate that the actual backward error can be upto 2 orders of magnitude higher than the relative residual. Finally by taking into account the structure of the equation we derive an expression for the condition number, and discuss an algorithm for their computation in the special case of the stochastic Galerkin method.

show abstract

Section: Application To the Stochastic Galerkin Methodsmentioning

confidence: 99%

Backward error and condition number of a generalized Sylvester equation, with application to the stochastic Galerkin method

Pranesh

2020

Linear Algebra and its Applications

View full text Add to dashboard Cite

show abstract

“…EISPACK was designed to run on a single‐core CPU and was replaced by LINPACK, which first implemented the SVD algorithm with basic linear algebra subprogram (BLAS) interface. The performance of LINPACK was limited by the BLAS1 implementation and benefited little from multicore architectures . LAPACK redesigned the SVD algorithm to use BLAS3 routines wherever possible to improve the performance on the multicore CPUs.…”

Section: Related Workmentioning

confidence: 99%

“…This low efficiency is due to the computation of tall‐skinny GEMM, which is closer to GEMV (BLAS2 routines) than GEMM (BLAS3 routines). The BLAS2 routines are less efficient than the BLAS3 routines due to vector accesses that degrade cache hit rate on both the multicore CPU and GPU; BLAS3 routines are 20 to 40 times more efficient than BLAS2 routines . Regarding the in‐core performance of tall‐skinny GEMM, Chen et al achieved 1.1 to 3.0× speedups over cuBLAS for tall‐skinny matrices with up to 16 columns.…”

Section: Gpu‐accelerated Out‐of‐core Gemmmentioning

confidence: 99%

“…Both underlying methods can be implemented using general matrix‐matrix multiplication (GEMM) routines, whereas general matrix‐vector multiplication (GEMV) is required for deterministic SVD computation. GEMM is more suitable for modern parallel computers, where it achieves 20 to 40 times higher flop/s than GEMV …”

Section: Introductionmentioning

confidence: 99%

“…First, large matrix data may not fit into the GPU memory due to its limited capacity. Second, communication across distinct memory hierarchies or networks often constitutes a performance bottleneck due to the increasing gap between arithmetic and communication performance …”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Reducing the amount of out‐of‐core data access for GPU‐accelerated randomized SVD

Yamazaki

Ino

et al. 2020

Concurrency and Computation

Self Cite

View full text Add to dashboard Cite

Summary We propose two acceleration methods, namely, Fused and Gram, for reducing out‐of‐core data access when performing randomized singular value decomposition (RSVD) on graphics processing units (GPUs). Out‐of‐core data here are data that are too large to fit into the GPU memory at once. Both methods accelerate GPU‐enabled RSVD using the following three schemes: (1) a highly tuned general matrix‐matrix multiplication (GEMM) scheme for processing out‐of‐core data on GPUs; (2) a data‐access reduction scheme based on one‐dimensional data partition; and (3) a first‐in, first‐out scheme that reduces CPU‐GPU data transfer using the reverse iteration. The Fused method further reduces the amount of out‐of‐core data access by merging two GEMM operations into a single operation. By contrast, the Gram method reduces both in‐core and out‐of‐core data access by explicitly forming the Gram matrix. According to our experimental results, the Fused and Gram methods improved the RSVD performance up to 1.7× and 5.2×, respectively, compared with a straightforward method that deploys schemes (1) and (2) on the GPU. In addition, we present a case study of deploying the Gram method for accelerating robust principal component analysis, a convex optimization problem in machine learning.

show abstract

Preconditioned Jacobi SVD Algorithm Outperforms PDGESVD

Bečka

Okša

2020

Parallel Processing and Applied Mathematics

View full text Add to dashboard Cite

The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale

Cited by 81 publications

References 100 publications

Backward error and condition number of a generalized Sylvester equation, with application to the stochastic Galerkin method

Backward error and condition number of a generalized Sylvester equation, with application to the stochastic Galerkin method

Reducing the amount of out‐of‐core data access for GPU‐accelerated randomized SVD

Preconditioned Jacobi SVD Algorithm Outperforms PDGESVD

Contact Info

Product

Resources

About