2014
DOI: 10.1137/130929060
|View full text |Cite
|
Sign up to set email alerts
|

Communication-Avoiding Symmetric-Indefinite Factorization

Abstract: Abstract. We describe and analyze a novel symmetric triangular factorization algorithm. The algorithm is essentially a block version of Aasen's triangular tridiagonalization. It factors a dense symmetric matrix A as the product A = P LT L T P T where P is a permutation matrix, L is lower triangular, and T is block tridiagonal and banded. The algorithm is the first symmetric-indefinite communication-avoiding factorization: it performs an asymptotically optimal amount of communication in a two-level memory hiera… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
23
0

Year Published

2016
2016
2017
2017

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 13 publications
(23 citation statements)
references
References 29 publications
0
23
0
Order By: Relevance
“…On the other hand, the backward errors of the CA Aasen's algorithm were about an order of magnitude greater than the standard algorithms. As explained elsewhere, this is expected because the backward errors of the CA Aasen's algorithm depend linearly to the block size (ie, n d = 128). A few iterations of iterative refinement can smooth out the residual norm.…”
Section: Numerical Resultsmentioning
confidence: 88%
See 3 more Smart Citations
“…On the other hand, the backward errors of the CA Aasen's algorithm were about an order of magnitude greater than the standard algorithms. As explained elsewhere, this is expected because the backward errors of the CA Aasen's algorithm depend linearly to the block size (ie, n d = 128). A few iterations of iterative refinement can smooth out the residual norm.…”
Section: Numerical Resultsmentioning
confidence: 88%
“…In this section, we discuss the 3 symmetric indefinite factorization algorithms, the partitioned Bunch‐Kaufman, the partitioned Aasen's, and the CA Aasen's, which are studied in this paper.…”
Section: Algorithmsmentioning
confidence: 99%
See 2 more Smart Citations
“…This is often an effective programming paradigm for many of the LAPACK subroutines because the panel factorization is based on BLAS-1 or BLAS-2, which can be efficiently implemented on the CPU, while BLAS-3 is used for the submatrix updates, which exhibit high-data parallelism and can be efficiently implemented on the GPU [3,25]. Although the 7 of 15 another variant of the algorithm was proposed [12]. Hence, although copying the panel from the GPU to the CPU can be overlapped with the update of the rest of the trailing submatrix on the GPU, the look-ahead -a standard optimization technique to overlap the panel factorization on the CPU with the trailing submatrix update on the GPU -is prohibited.…”
Section: Bunch-kaufman Algorithmmentioning
confidence: 99%