2013
DOI: 10.1002/cpe.3110
|View full text |Cite
|
Sign up to set email alerts
|

Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting

Abstract: The LU factorization is an important numerical algorithm for solving systems of linear equations in science and engineering and is a characteristic of many dense linear algebra computations. For example, it has become the de facto numerical algorithm implemented within the LINPACK benchmark to rank the most powerful supercomputers in the world, collected by the TOP500 website. Multicore processors continue to present challenges to the development of fast and robust numerical software due to the increasing leve… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
27
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
4
2
2

Relationship

2
6

Authors

Journals

citations
Cited by 34 publications
(28 citation statements)
references
References 33 publications
0
27
0
Order By: Relevance
“…This optimization used in the computation of the slab factorization improved the computation speed by a factor of 2. This approach shares similarities with the recursive computation of the panel described in [9].…”
Section: Algorithmic Variants For Pluq Factorizationmentioning
confidence: 98%
“…This optimization used in the computation of the slab factorization improved the computation speed by a factor of 2. This approach shares similarities with the recursive computation of the panel described in [9].…”
Section: Algorithmic Variants For Pluq Factorizationmentioning
confidence: 98%
“…We refrain from publishing a comprehensive set of performance charts for the LU factorizations as we have done elsewhere [26]. Figure 1 shows the achieved performance of mixed precision solvers with iterative refinement on both systems for non-symmetric matrices (LU).…”
Section: Performance Resultsmentioning
confidence: 99%
“…Because of the row pivoting, the panel factorization (factorization of a block of columns) cannot be tiled, and the panel has to be dealt with as a whole, which handicaps cache efficiency. In recent years, this problem has been addressed by algorithms that try to achieve good level of cache residency for this operation [12,16]. Many alternative approaches have also been developed [14], but this article focuses on the classic formulations.…”
Section: Lu Decompositionmentioning
confidence: 99%
“…This is in contrast with LAPACK, where one tall panel (block of columns) is eliminated at a time, making it difficult to achieve cache efficiency and apply multithreading. In the course of the PLASMA project, tile algorithms have been developed for a wide range of algorithms, including: Cholesky, LU and QR factorizations [11,14,16], as well as reductions to band forms for solving the singular value problem or the eigenvalue problem [23,31].…”
Section: Plasmamentioning
confidence: 99%