An improved divide-and-conquer algorithm for the banded matrices with narrow bandwidths

Liao, Xiangke; Li, Shengguo; Cheng, Lizhi; Gu, Ming

doi:10.1016/j.camwa.2016.03.008

Cited by 13 publications

(15 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While many algorithms have been developed for general banded matrices, the ktridiagonal form allows for even further optimization. For example, notice that, as the matrix becomes wider-banded (i.e., as k increases), our algorithm gets better whereas the previous algorithm [32] gets worse, i.e., when k = n, the previous algorithm is worse than the traditional full-matrix BCD SVD. Average runtime of our proposed algorithm (marked S in the legend), the algorithm of [30] (marked NS in the legend) and Lapack for n = 10, 000 and multiple values of k and multiple numbers of cpus.…”

Section: Discussionmentioning

confidence: 89%

Scalability of k-Tridiagonal Matrix Singular Value Decomposition

et al. 2021

View full text Add to dashboard Cite

Singular value decomposition has recently seen a great theoretical improvement for k-tridiagonal matrices, obtaining a considerable speed up over all previous implementations, but at the cost of not ordering the singular values. We provide here a refinement of this method, proving that reordering singular values does not affect performance. We complement our refinement with a scalability study on a real physical cluster setup, offering surprising results. Thus, this method provides a major step up over standard industry implementations.

show abstract

Section: Discussionmentioning

confidence: 89%

Scalability of k-Tridiagonal Matrix Singular Value Decomposition

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Following our current work, there are some interesting research projects to do in the future. First, PSMMA can be used to extend the banded DC algorithms proposed in [14,45] to distributed memory platforms in a similar manner. Secondly, the structured matrix-matrix multiplication techniques can be used in heterogeneous architectures, which can reduce the data movements from CPU to the accelerators such as GPU 1 .…”

Section: Future Workmentioning

confidence: 99%

“…Then, the worst case complexity of DC can be reduced from O(N 3 ) to O(N 2 r), where r is a modest number and is usually much smaller than a large N , see [12]. This technique was extended for the bidiagonal and banded DC algorithms for the SVD problem on a shared memory multicore platform in [13,14]. For distributed memory machines, a parallel DC algorithm is similarly proposed in [15] by using STRUMPACK (STRUctured Matrices PACKage) [16], which provides some distributed parallel HSS algorithms.…”

Section: Introductionmentioning

confidence: 99%

A Parallel Structured Divide-and-Conquer Algorithm for Symmetric Tridiagonal Eigenvalue Problems

Liao

et al. 2021

IEEE Trans. Parallel Distrib. Syst.

Self Cite

View full text Add to dashboard Cite

In this paper, a parallel structured divide-and-conquer (PSDC) eigensolver is proposed for symmetric tridiagonal matrices based on ScaLAPACK and a parallel structured matrix multiplication algorithm, called PSMMA. Computing the eigenvectors via matrix-matrix multiplications is the most computationally expensive part of the divide-and-conquer algorithm, and one of the matrices involved in such multiplications is a rank-structured Cauchylike matrix. By exploiting this particular property, PSMMA constructs the local matrices by using generators of Cauchy-like matrices without any communication, and further reduces the computation costs by using a structured low-rank approximation algorithm. Thus, both the communication and computation costs are reduced. Numerical results show that both PSMMA and PSDC are highly scalable and scale to 4096 processes at least. PSDC has better scalability than PHDC that was proposed in [J. Comput. Appl. Math. 344 (2018) 512-520] and only scaled to 300 processes for the same matrices. Comparing with PDSTEDC in ScaLAPACK, PSDC is always faster and achieves 1.3x-1.6x speedup for some matrices with few deflations. PSDC is also comparable with ELPA, with PSDC being faster than ELPA when using few processes and a little slower when using many processes.

show abstract

“…Recently, the authors [27] used the hierarchically semiseparable (HSS) matrices [8] to accelerate the tridiagonal DC in LAPACK, and obtained about 6x speedups in comparison with that in LAPACK for some large matrices on a shared memory multicore platform. The bidiagonal and banded DC algorithms for the SVD problem are accelerated similarly [26,28]. The main point is that some intermediate eigenvector matrices are rank-structured matrices [8,20].…”

Section: Introductionmentioning

confidence: 99%

“…It is written in C++ using OpenMP and MPI parallism, uses HSS matrices, and it implements a parallel HSS construction algorithm with randomized sampling [35,24]. Note that some routines are available for sequential HSS algorithms [43,10] or parallel HSS algorithms on shared memory platforms such as HSSPACK [28] 2 . But STRUMPACK is the only available one for the distributed parallel HSS algorithms.…”

Section: Introductionmentioning

confidence: 99%

An efficient hybrid tridiagonal divide-and-conquer algorithm on distributed memory architectures

Rouet

Liu

et al. 2018

Journal of Computational and Applied Mathematics

Self Cite

View full text Add to dashboard Cite

In this paper, an efficient divide-and-conquer (DC) algorithm is proposed for the symmetric tridiagonal matrices based on ScaLAPACK and the hierarchically semiseparable (HSS) matrices. HSS is an important type of rankstructured matrices. Most time of the DC algorithm is cost by computing the eigenvectors via the matrix-matrix multiplications (MMM). In our parallel hybrid DC (PHDC) algorithm, MMM is accelerated by using the HSS matrix techniques when the intermediate matrix is large. All the HSS algorithms are done via the package STRUMPACK. PHDC has been tested by using many different matrices. Compared with the DC implementation in MKL, PHDC can be faster for some matrices with few deflations when using hundreds of processes. However, the gains decrease as the number of processes increases. The comparisons of PHDC with ELPA (the Eigenvalue soLvers for Petascale Applications library) are similar. PHDC is usually slower than MKL and ELPA when using 300 or more processes on Tianhe-2 supercomputer.1 The current version is STRUMPACK-Dense-1.1.1, which is available at http://portal. nersc.gov/project/sparse/strumpack/ 2 Some Fortran and Matlab codes are available at Jianlin Xia's homepage, http://www.

show abstract

An improved divide-and-conquer algorithm for the banded matrices with narrow bandwidths

Cited by 13 publications

References 38 publications

Scalability of k-Tridiagonal Matrix Singular Value Decomposition

Scalability of k-Tridiagonal Matrix Singular Value Decomposition

A Parallel Structured Divide-and-Conquer Algorithm for Symmetric Tridiagonal Eigenvalue Problems

An efficient hybrid tridiagonal divide-and-conquer algorithm on distributed memory architectures

Contact Info

Product

Resources

About