2018
DOI: 10.1016/j.parco.2017.09.001
|View full text |Cite
|
Sign up to set email alerts
|

Batched QR and SVD algorithms on GPUs with applications in hierarchical matrix compression

Abstract: We present high performance implementations of the QR and the singular value decomposition of a batch of small matrices hosted on the GPU with applications in the compression of hierarchical matrices. The one-sided Jacobi algorithm is used for its simplicity and inherent parallelism as a building block for the SVD of low rank blocks using randomized methods. We implement multiple kernels based on the level of the GPU memory hierarchy in which the matrices can reside and show substantial speedups against stream… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
23
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
4
1

Relationship

3
6

Authors

Journals

citations
Cited by 45 publications
(23 citation statements)
references
References 23 publications
0
23
0
Order By: Relevance
“…Performance is obtained because the large amount of compute-intensive factorizations, both QR and SVD, that are performed at every level can be efficiently executed by batched kernels. We have developed batched QR and batched adaptive randomized SVD operations for this purpose [26,27].…”
Section: (B) Linear Algebra Operations With Hierarchical Matricesmentioning
confidence: 99%
“…Performance is obtained because the large amount of compute-intensive factorizations, both QR and SVD, that are performed at every level can be efficiently executed by batched kernels. We have developed batched QR and batched adaptive randomized SVD operations for this purpose [26,27].…”
Section: (B) Linear Algebra Operations With Hierarchical Matricesmentioning
confidence: 99%
“…If additional rank reduction is required (to satisfy some fixed error threshold), an approximate singular value decomposition can be easily obtained from this low rank form by computing the QR decomposition of B and then the SVD of the small k ×k triangular factor. This method has been implemented as a batched GPU routine and used to accelerate the compression of hierarchical matrices [7] in place of the full singular value decomposition.…”
Section: Notation and Definitionsmentioning
confidence: 99%
“…In [Abdelfa ah et al 2016a,b,d], a newer implementation in MAGMA is proposed for handling batched matrix factorizations with variable sizes, which has also been of interests in the context of accelerating sparse linear algebra [SuiteSparse 2017] during the Schur complement calculations. More recently, some of the authors have proposed new batched QR and SVD kernels for very small matrix sizes with applications in the compression of hierarchical matrices [Boukaram et al 2017].…”
Section: Related Workmentioning
confidence: 99%