A BLAS-3 Version of the QR Factorization with Column Pivoting

Quintana-Ortí, Gregorio; Sun, Xiaobai; Bischof, Christian

doi:10.1137/s1064827595296732

Cited by 92 publications

(70 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…dgeqp3. The implementation of blocked HQRP that is part of the netlib implemenation of LAPACK, based on [31], modified so that the block size can be controlled. HQRRPbasic.…”

Section: Performance Experimentsmentioning

confidence: 99%

“…Widely used current implementations of the level-3 BLAS are based on techniques exposed by Goto [15,14] The fundamental problem with the classical approach to HQRP is that only half of the computation can be cast in terms of gemm, as described in the paper [31] that…”

mentioning

confidence: 99%

“…The fundamental problem with the classical approach to HQRP is that only half of the computation can be cast in terms of gemm, as described in the paper [31] underlies LAPACK's geqp3 routine [3]. This means that blocking can only improve performance by, at best, a factor two, which is inherent from the fact that it must be known how remaining columns will be updated in order to compute the 2-norms of remaining columns.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Householder QR Factorization With Randomization for Column Pivoting (HQRRP)

Martinsson¹,

Ortí²,

Heavner³

et al. 2017

SIAM J. Sci. Comput.

View full text Add to dashboard Cite

Abstract. A fundamental problem when adding column pivoting to the Householder QR factorization is that only about half of the computation can be cast in terms of high performing matrixmatrix multiplications, which greatly limits the benefits that can be derived from so-called blocking of algorithms. This paper describes a technique for selecting groups of pivot vectors by means of randomized projections. It is demonstrated that the asymptotic flop count for the proposed method is 2mn 2 − (2/3)n 3 for an m × n matrix, identical to that of the best classical unblocked Householder QR factorization algorithm (with or without pivoting). Experiments demonstrate acceleration in speed of close to an order of magnitude relative to the geqp3 function in LAPACK, when executed on a modern CPU with multiple cores. Further, experiments demonstrate that the quality of the randomized pivot selection strategy is roughly the same as that of classical column pivoting. The described algorithm is made available under Open Source license and can be used with LAPACK or libflame.1. Introduction. The QR factorization is a staple of linear algebra, with applications ranging from Linear Least-Squares solution of overdetermined systems to the identification of low rank approximation via the determination of an approximate orthonormal basis for the column space. Standard algorithms for computing the QR factorization include Gram-Schmidt orthogonalization and those based on Householder transformations (reflectors). When it is desirable for the QR factorization to also reveal the approximate rank of the original matrix, it is important that the elements of the diagonal of R be ordered with larger elements in magnitude appearing earlier. In this case, column pivoting (swapping) is employed during the QR factorization, yielding QR factorization with column pivoting (QRP). It is well-known that the Householder QR factorization (HQR) yields columns of Q that are orthogonal to a high degree of precision, making these algorithms the weapon of choice in many situations. Pivoting can be added to HQR to yield HQR with column pivoting (HQRP). This topic is covered by standard texts on numerical linear algebra [13].To achieve high performance for dense linear algebra algorithms, so-called blocked algorithms are employed that cast most computation in terms of matrix-matrix operations supported by the widely used level-3 Basic Linear Algebra Subprograms (BLAS) [7,8] because such operations can be implemented to achieve very high performance on modern processors via a combination of careful reuse of data in the caches and low level implementation in terms of assembly code or intrinsic vector operations. Widely used current implementations of the level-3 BLAS are based on techniques exposed by Goto [15,14] The fundamental problem with the classical approach to HQRP is that only half of the computation can be cast in terms of gemm, as described in the paper [31] that

show abstract

“…dgeqp3. The implementation of blocked HQRP that is part of the netlib implemenation of LAPACK, based on [31], modified so that the block size can be controlled. HQRRPbasic.…”

Section: Performance Experimentsmentioning

confidence: 99%

mentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

Householder QR Factorization With Randomization for Column Pivoting (HQRRP)

Martinsson¹,

Ortí²,

Heavner³

et al. 2017

SIAM J. Sci. Comput.

View full text Add to dashboard Cite

show abstract

“…This factorization was introduced in [16], and the first algorithm to compute it was proposed in [6] and is based on the QR factorization with column pivoting (QRCP). A BLAS-3 version of this algorithm [25] is implemented in LAPACK [1], and its parallel version in ScaLAPACK [5].…”

mentioning

confidence: 99%

“…In classic QR factorization with column pivoting, at each step i of the factorization, the remaining unselected column of maximum norm is selected and exchanged with the i-th column, its subdiagonal elements are annihilated, using for example a Householder transformation, and then the trailing matrix is updated. A block version of this algorithm is described in [25]. The main difficulty in reducing communication in rank revealing QR factorization lies in identifying b pivot columns at each step of the block algorithm.…”

mentioning

confidence: 99%

Communication Avoiding Rank Revealing QR Factorization with Column Pivoting

Demmel¹,

Grigori²,

Gu³

et al. 2015

SIAM J. Matrix Anal. & Appl.

View full text Add to dashboard Cite

Abstract. In this paper we introduce CARRQR, a communication avoiding rank revealing QR factorization with tournament pivoting. We show that CARRQR reveals the numerical rank of a matrix in an analogous way to QR factorization with column pivoting (QRCP). Although the upper bound of a quantity involved in the characterization of a rank revealing factorization is worse for CARRQR than for QRCP, our numerical experiments on a set of challenging matrices show that this upper bound is very pessimistic, and CARRQR is an effective tool in revealing the rank in practical problems.Our main motivation for introducing CARRQR is that it minimizes data transfer, modulo polylogarithmic factors, on both sequential and parallel machines, while previous factorizations as QRCP are communication sub-optimal and require asymptotically more communication than CARRQR. Hence CARRQR is expected to have a better performance on current and future computers, where commmunication is a major bottleneck that highly impacts the performance of an algorithm.

show abstract

Computing rank‐revealing factorizations of matrices stored out‐of‐core

Heavner

Martinsson

Quintana-Ortí

2023

Concurrency and Computation

View full text Add to dashboard Cite

SummaryThis paper describes efficient algorithms for computing rank‐revealing factorizations of matrices that are too large to fit in main memory (RAM), and must instead be stored on slow external memory devices such as disks (out‐of‐core or out‐of‐memory). Traditional algorithms for computing rank‐revealing factorizations (such as the column pivoted QR factorization and the singular value decomposition) are very communication intensive as they require many vector‐vector and matrix‐vector operations, which become prohibitively expensive when data is not in RAM. Randomization allows to reformulate new methods so that large contiguous blocks of the matrix are processed in bulk. The paper describes two distinct methods. The first is a blocked version of column pivoted Householder QR, organized as a “left‐looking” method to minimize the number of the expensive write operations. The second method results employs a UTV factorization. It is organized as an algorithm‐by‐blocks to overlap computations and I/O operations. As it incorporates power iterations, it is much better at revealing the numerical rank. Numerical experiments on several computers demonstrate that the new algorithms are almost as fast when processing data stored on slow memory devices as traditional algorithms are for data stored in RAM.

show abstract

A BLAS-3 Version of the QR Factorization with Column Pivoting

Cited by 92 publications

References 23 publications

Householder QR Factorization With Randomization for Column Pivoting (HQRRP)

Householder QR Factorization With Randomization for Column Pivoting (HQRRP)

Communication Avoiding Rank Revealing QR Factorization with Column Pivoting

Computing rank‐revealing factorizations of matrices stored out‐of‐core

Contact Info

Product

Resources

About