2009
DOI: 10.1073/pnas.0803205106
|View full text |Cite
|
Sign up to set email alerts
|

CUR matrix decompositions for improved data analysis

Abstract: Principal components analysis and, more generally, the Singular Value Decomposition are fundamental data analysis tools that express a data matrix in terms of a sequence of orthogonal or uncorrelated vectors of decreasing importance. Unfortunately, being linear combinations of up to all the data points, these vectors are notoriously difficult to interpret in terms of the data and processes generating the data. In this article, we develop CUR matrix decompositions for improved data analysis. CUR decompositions … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
558
0
4

Year Published

2009
2009
2021
2021

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 618 publications
(578 citation statements)
references
References 29 publications
3
558
0
4
Order By: Relevance
“…To provide a fair comparison, we incorporate several extensions into the importance sampling based methods: both CUR-L2 and CUR-SL use the extensions proposed for CMD [26] and, in both cases, we sample exactly the same number of unique rows and columns as in the case of LS-DCUR and Greedy (double selections do not count as a selected row or column). For methods requiring computation of the top-k singular vectors (CUR-SL, Greedy), we specify a reasonable k. As setting it to the actual number of sampled rows and columns is not advisable, we follow the suggestion of [22] and over-sampled k; various experimental runs show that setting k to ≈ 4 5 of the number of row and column samples provides a convenient tradeoff between run-time performance and approximation accuracy; note that LS-DCUR does not require any additional parameters apart from the number of desired rows and columns. All tested methods were carefully implemented in Python/Numpy taking advantage of standard Lapack/Blas routines and the Arpack library as a sparse-eigenvalue solver.…”
Section: Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…To provide a fair comparison, we incorporate several extensions into the importance sampling based methods: both CUR-L2 and CUR-SL use the extensions proposed for CMD [26] and, in both cases, we sample exactly the same number of unique rows and columns as in the case of LS-DCUR and Greedy (double selections do not count as a selected row or column). For methods requiring computation of the top-k singular vectors (CUR-SL, Greedy), we specify a reasonable k. As setting it to the actual number of sampled rows and columns is not advisable, we follow the suggestion of [22] and over-sampled k; various experimental runs show that setting k to ≈ 4 5 of the number of row and column samples provides a convenient tradeoff between run-time performance and approximation accuracy; note that LS-DCUR does not require any additional parameters apart from the number of desired rows and columns. All tested methods were carefully implemented in Python/Numpy taking advantage of standard Lapack/Blas routines and the Arpack library as a sparse-eigenvalue solver.…”
Section: Methodsmentioning
confidence: 99%
“…As selections made by random and deterministic approaches always correspond to actual data samples, they are easier to interpret as, for example, SVD decompositions [22]. Additionally, for the deterministic case, the resulting selections are often polar opposites.…”
Section: Interpretability and Data-space Coveragementioning
confidence: 99%
See 3 more Smart Citations