CUR matrix decompositions for improved data analysis

Mahoney, Michael W.; Drineas, Petros

doi:10.1073/pnas.0803205106

Cited by 618 publications

(578 citation statements)

References 29 publications

Supporting

Mentioning

558

Contrasting

Unclassified

Order By: Relevance

“…To provide a fair comparison, we incorporate several extensions into the importance sampling based methods: both CUR-L2 and CUR-SL use the extensions proposed for CMD [26] and, in both cases, we sample exactly the same number of unique rows and columns as in the case of LS-DCUR and Greedy (double selections do not count as a selected row or column). For methods requiring computation of the top-k singular vectors (CUR-SL, Greedy), we specify a reasonable k. As setting it to the actual number of sampled rows and columns is not advisable, we follow the suggestion of [22] and over-sampled k; various experimental runs show that setting k to ≈ 4 5 of the number of row and column samples provides a convenient tradeoff between run-time performance and approximation accuracy; note that LS-DCUR does not require any additional parameters apart from the number of desired rows and columns. All tested methods were carefully implemented in Python/Numpy taking advantage of standard Lapack/Blas routines and the Arpack library as a sparse-eigenvalue solver.…”

Section: Methodsmentioning

confidence: 99%

“…As selections made by random and deterministic approaches always correspond to actual data samples, they are easier to interpret as, for example, SVD decompositions [22]. Additionally, for the deterministic case, the resulting selections are often polar opposites.…”

Section: Interpretability and Data-space Coveragementioning

confidence: 99%

“…The π t form a probability distribution over the set of columns where columns that capture more dominant parts of the spectrum of X are assigned a higher probabilities [22].…”

Section: Cur Decompositionmentioning

confidence: 99%

“…Corresponding approaches yield interpretable results because they embed the data in lower dimensional spaces whose basis vectors correspond to actual data points. They are guaranteed to preserve properties such as sparseness or nonnegativity and enjoy increasing popularity in the data mining community [3,11,12,17,21,22,26,28] where they have been applied to fraud detection, fMRI segmentation, collaborative filtering, and co-clustering.…”

Section: Introductionmentioning

confidence: 99%

“…Approaches that rely on probability distributions which depend on the statistical leverage, i.e. the top-k singular subspace of a matrix [22], or the Euclidean norm of rows and columns [9,26] were introduced under the name CUR decomposition. As the CUR decomposition relies on informed row-and column-sampling strategies, it can scale to large data but only if statistical leverage is consider for k being rather small since the CUR algorithm involves computing the SVD of a matrix.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Deterministic CUR for Improved Large-Scale Data Analysis: An Empirical Study

Thurau¹,

Kersting

Bauckhage³

2012

Proceedings of the 2012 SIAM International Conference on Data Mining

View full text Add to dashboard Cite

Low-rank approximations which are computed from selected rows and columns of a given data matrix have attracted considerable attention lately. They have been proposed as an alternative to the SVD because they naturally lead to interpretable decompositions which was shown to be successful in application such as fraud detection, fMRI segmentation, and collaborative filtering. The CUR decomposition of large matrices, for example, samples rows and columns according to a probability distribution that depends on the Euclidean norm of rows or columns or on other measures of statistical leverage. At the same time, there are various deterministic approaches that do not resort to sampling and were found to often yield factorization of superior quality with respect to reconstruction accuracy. However, these are hardly applicable to large matrices as they typically suffer from high computational costs. Consequently, many practitioners in the field of data mining have abandon deterministic approaches in favor of randomized ones when dealing with today's large-scale data sets. In this paper, we empirically disprove this prejudice. We do so by introducing a novel, linear-time, deterministic CUR approach that adopts the recently introduced Simplex Volume Maximization approach for column selection. The latter has already been proven to be successful for NMF-like decompositions of matrices of billions of entries. Our exhaustive empirical study on more than 30 synthetic and real-world data sets demonstrates that it is also beneficial for CUR-like decompositions. Compared to other deterministic CUR-like methods, it provides comparable reconstruction quality but operates much faster so that it easily scales to matrices of billions of elements. Compared to sampling-based methods, it provides competitive reconstruction quality while staying in the same run-time complexity class.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Interpretability and Data-space Coveragementioning

confidence: 99%

“…The π t form a probability distribution over the set of columns where columns that capture more dominant parts of the spectrum of X are assigned a higher probabilities [22].…”

Section: Cur Decompositionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Deterministic CUR for Improved Large-Scale Data Analysis: An Empirical Study

Thurau¹,

Kersting

Bauckhage³

2012

Proceedings of the 2012 SIAM International Conference on Data Mining

View full text Add to dashboard Cite

show abstract

Nonnegative Matrix and Tensor Factorizations

Cichocki¹,

Zdunek²,

Phan³

et al. 2009

1,190

463

View full text Add to dashboard Cite

Introduction – Problem Statements and Models

Cichocki

Zdunek

Phan

et al. 2009

Nonnegative Matrix and Tensor Factorizations

View full text Add to dashboard Cite

Introduction -Problem Statements and ModelsMatrix factorization is an important and unifying topic in signal processing and linear algebra, which has found numerous applications in many other areas. This chapter introduces basic linear and multi-linear 1 models for matrix and tensor factorizations and decompositions, and formulates the analysis framework for the solution of problems posed in this book. The workhorse in this book is Nonnegative Matrix Factorization (NMF) for sparse representation of data and its extensions including the multi-layer NMF, semi-NMF, sparse NMF, tri-NMF, symmetric NMF, orthogonal NMF, non-smooth NMF (nsNMF), overlapping NMF, convolutive NMF (CNMF), and large-scale NMF. Our particular emphasis is on NMF and semi-NMF models and their extensions to multi-way models (i.e., multi-linear models which perform multi-way array (tensor) decompositions) with nonnegativity and sparsity constraints, including, Nonnegative Tucker Decompositions (NTD), Constrained Tucker Decompositions, Nonnegative and semi-nonnegative Tensor Factorizations (NTF) that are mostly based on a family of the TUCKER, PARAFAC and PARATUCK models.As the theory and applications of NMF, NTF and NTD are still being developed, our aim is to produce a unified, state-of-the-art framework for the analysis and development of efficient and robust algorithms. In doing so, our main goals are to:1. Develop various working tools and algorithms for data decomposition and feature extraction based on nonnegative matrix factorization (NMF) and sparse component analysis (SCA) approaches. We thus integrate several emerging techniques in order to estimate physically, physiologically, and neuroanatomically meaningful sources or latent (hidden) components with morphological constraints. These constraints include nonnegativity, sparsity, orthogonality, smoothness, and semi-orthogonality. 2. Extend NMF models to multi-way array (tensor) decompositions, factorizations, and filtering, and to derive efficient learning algorithms for these models. 3. Develop a class of advanced blind source separation (BSS), unsupervised feature extraction and clustering algorithms, and to evaluate their performance using a priori knowledge and morphological constraints. 4. Develop computational methods to efficiently solve the bi-linear system Y = AX + E for noisy data, where Y is an input data matrix, A and X represent unknown matrix factors to be estimated, and the matrix E represents error or noise (which should be minimized using suitably designed cost function).1 A function in two or more variables is said to be multi-linear if it is linear in each variable separately.

show abstract

CUR matrix decompositions for improved data analysis

Cited by 618 publications

References 29 publications

Deterministic CUR for Improved Large-Scale Data Analysis: An Empirical Study

Deterministic CUR for Improved Large-Scale Data Analysis: An Empirical Study

Nonnegative Matrix and Tensor Factorizations

Introduction – Problem Statements and Models

Contact Info

Product

Resources

About