Improved Symbolic and Numerical Factorization Algorithms for Unsymmetric Sparse Matrices

Abstract. Supernode pivoting for unsymmetric matrices coupled with supernode partitioning and asynchronous computation can achieve high gigaflop rates for parallel sparse LU factorization on shared memory parallel computers. The progress in weighted graph matching algorithms helps to extend these concepts further and prepermutation of rows is used to place large matrix entries on the diagonal. Supernode pivoting allows dynamical interchanges of columns and rows during the factorization process. The BLAS-3 level efficiency is retained. An enhanced left-right looking scheduling scheme is uneffected and results in good speedup on SMP machines without increasing the operation count. These algorithms have been integrated into the recent unsymmetric version of the PARDISO solver. Experiments demonstrate that a wide set of unsymmetric linear systems can be solved and high performance is consistently achieved for large sparse unsymmetric matrices from real world applications.

show abstract

“…A detailed comparison can be found in [12,13]. A "fail" indicates that the solver ran out of memory, e.g.…”

Section: Resultsmentioning

confidence: 99%

“…The best time is shown in boldface, the second best time is underlined, and the best operation count is indicated by . The last row shows the approximate smallest relative pivot threshold that yielded a residual norm close to machine precision after iterative refinement for each package ( [12,13]). …”

Section: Parallel Lu Algorithm With a Two-level Schedulingmentioning

confidence: 99%

Solving Unsymmetric Sparse Systems of Linear Equations with PARDISO

Schenk

Gärtner

2002

Lecture Notes in Computer Science

142

154

View full text Add to dashboard Cite

show abstract

“…Then the algorithm computes the structure of row i of U by combining the structures of earlier rows whose indices are the nonzeros in row i of L. In general, these minimal edags are often more expensive to compute than the symmetrically-pruned edags, due to the cost of transitively reducing each row. Gupta recently proposed a different algorithm for computing the minimal edags [41]. His algorithm computes the minimal structure of U by rows and of L by columns.…”

Section: Elimination Dagsmentioning

confidence: 99%

“…One such code, Davis's UMFPACK 4, uses the column elimination tree to represent control-flow dependences, and a biclique cover to represent data dependences [9]. Another code, Gupta's WSMP, uses conventional minimal edags to represent control-flow dependences, and specialized dags to represent data dependences [41]. More specifically, Gupta shows how to modify the minimal edags so they exactly represent data dependences in the unsymmetric multifrontal algorithm with no pivoting, and how to modify the edags to represent dependences in an unsymmetric multifrontal algorithm that employs delayed pivoting.…”

Section: Elimination Structures For the Unsymmetric Multifrontal Algomentioning

confidence: 99%

Elimination Structures in Scientific Computing

Pothen¹,

Toledo²

2004

Handbook of Data Structures and Applications

View full text Add to dashboard Cite

“…Static pivoting allows more detailed planning of the scheduling of a parallel algorithm, because the row permutation is known before the numerical factorization begins. Finally, delayed-pivoting algorithms, such as [31], perform both row and column exchanges during 1. INTRODUCTION 8 the numerical factorization.…”

Section: Introductionmentioning

confidence: 99%

Parallel unsymmetric-pattern multifrontal sparse LU with column preordering

Avron

Shklarski

Toledo

2008

ACM Trans. Math. Softw.

View full text Add to dashboard Cite

We present a new parallel sparse LU factorization algorithm and code. The algorithm uses a column-preordering partial-pivoting unsymmetricpattern multifrontal approach. Our baseline sequential algorithm is based on umfpack 4 but is somewhat simpler and is often somewhat faster than umfpack version 4.0. Our parallel algorithm is designed for shared-memory machines with a small or moderate number of processors (we tested it on up to 32 processors). We experimentally compare our algorithm with SuperLU_MT, an existing shared-memory sparse LU factorization with partial pivoting. SuperLU_MT scales better than our new algorithm, but our algorithm is more reliable and is usually faster in absolute (on up to 16 processors; we were not able to run SuperLU_MT on 32). More specically, on large matrices our algorithm is always faster on up to 4 processors, and is usually faster on 8 and 16. The main contribution of this paper is showing that the column-preordering partial-pivoting unsymmetric-pattern multifrontal approach, developed as a sequential algorithm by Davis in several recent versions of umfpack, can be eectively parallelized.

show abstract

Improved Symbolic and Numerical Factorization Algorithms for Unsymmetric Sparse Matrices

Cited by 35 publications

References 19 publications

Solving Unsymmetric Sparse Systems of Linear Equations with PARDISO

Solving Unsymmetric Sparse Systems of Linear Equations with PARDISO

Elimination Structures in Scientific Computing

Parallel unsymmetric-pattern multifrontal sparse LU with column preordering

Contact Info

Product

Resources

About