2016
DOI: 10.1016/j.camwa.2016.03.008
|View full text |Cite
|
Sign up to set email alerts
|

An improved divide-and-conquer algorithm for the banded matrices with narrow bandwidths

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
6
1

Relationship

2
5

Authors

Journals

citations
Cited by 13 publications
(15 citation statements)
references
References 38 publications
0
15
0
Order By: Relevance
“…While many algorithms have been developed for general banded matrices, the ktridiagonal form allows for even further optimization. For example, notice that, as the matrix becomes wider-banded (i.e., as k increases), our algorithm gets better whereas the previous algorithm [32] gets worse, i.e., when k = n, the previous algorithm is worse than the traditional full-matrix BCD SVD. Average runtime of our proposed algorithm (marked S in the legend), the algorithm of [30] (marked NS in the legend) and Lapack for n = 10, 000 and multiple values of k and multiple numbers of cpus.…”
Section: Discussionmentioning
confidence: 89%
“…While many algorithms have been developed for general banded matrices, the ktridiagonal form allows for even further optimization. For example, notice that, as the matrix becomes wider-banded (i.e., as k increases), our algorithm gets better whereas the previous algorithm [32] gets worse, i.e., when k = n, the previous algorithm is worse than the traditional full-matrix BCD SVD. Average runtime of our proposed algorithm (marked S in the legend), the algorithm of [30] (marked NS in the legend) and Lapack for n = 10, 000 and multiple values of k and multiple numbers of cpus.…”
Section: Discussionmentioning
confidence: 89%
“…Following our current work, there are some interesting research projects to do in the future. First, PSMMA can be used to extend the banded DC algorithms proposed in [14,45] to distributed memory platforms in a similar manner. Secondly, the structured matrix-matrix multiplication techniques can be used in heterogeneous architectures, which can reduce the data movements from CPU to the accelerators such as GPU 1 .…”
Section: Future Workmentioning
confidence: 99%
“…Then, the worst case complexity of DC can be reduced from O(N 3 ) to O(N 2 r), where r is a modest number and is usually much smaller than a large N , see [12]. This technique was extended for the bidiagonal and banded DC algorithms for the SVD problem on a shared memory multicore platform in [13,14]. For distributed memory machines, a parallel DC algorithm is similarly proposed in [15] by using STRUMPACK (STRUctured Matrices PACKage) [16], which provides some distributed parallel HSS algorithms.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, the authors [27] used the hierarchically semiseparable (HSS) matrices [8] to accelerate the tridiagonal DC in LAPACK, and obtained about 6x speedups in comparison with that in LAPACK for some large matrices on a shared memory multicore platform. The bidiagonal and banded DC algorithms for the SVD problem are accelerated similarly [26,28]. The main point is that some intermediate eigenvector matrices are rank-structured matrices [8,20].…”
Section: Introductionmentioning
confidence: 99%
“…It is written in C++ using OpenMP and MPI parallism, uses HSS matrices, and it implements a parallel HSS construction algorithm with randomized sampling [35,24]. Note that some routines are available for sequential HSS algorithms [43,10] or parallel HSS algorithms on shared memory platforms such as HSSPACK [28] 2 . But STRUMPACK is the only available one for the distributed parallel HSS algorithms.…”
Section: Introductionmentioning
confidence: 99%