2009
DOI: 10.1073/pnas.0810600105
|View full text |Cite
|
Sign up to set email alerts
|

Spectral methods in machine learning and new strategies for very large datasets

Abstract: Spectral methods are of fundamental importance in statistics and machine learning, because they underlie algorithms from classical principal components analysis to more recent approaches that exploit manifold structure. In most cases, the core technical problem can be reduced to computing a low-rank approximation to a positive-definite kernel. For the growing number of applications dealing with very large or high-dimensional datasets, however, the optimal approximation afforded by an exact spectral decompositi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

4
107
0
1

Year Published

2011
2011
2023
2023

Publication Types

Select...
6
4

Relationship

1
9

Authors

Journals

citations
Cited by 106 publications
(117 citation statements)
references
References 14 publications
4
107
0
1
Order By: Relevance
“…Observe that in principle, sample size coincides with the dimension of the similarity and, hence, transition matrices, which may make its eigenanalysis computationally unfeasible for very large data sets. We will not deal with the issue of the complexity of DM's eigenanalysis in this work but point out that the usual approach is to apply an adequate subsampling of the original data [14]. In turn, this requires a mechanism to extend the embedding computed on that subsample to the other data points not considered or, more generally, new, unseen patterns.…”
Section: Introductionmentioning
confidence: 99%
“…Observe that in principle, sample size coincides with the dimension of the similarity and, hence, transition matrices, which may make its eigenanalysis computationally unfeasible for very large data sets. We will not deal with the issue of the complexity of DM's eigenanalysis in this work but point out that the usual approach is to apply an adequate subsampling of the original data [14]. In turn, this requires a mechanism to extend the embedding computed on that subsample to the other data points not considered or, more generally, new, unseen patterns.…”
Section: Introductionmentioning
confidence: 99%
“…For very high dimension the storage capacity seems very large for this method, but this can be overcome using a so-called two-pass strategy described in Frommer and Simoncini (2008). In the machine learning literature, Belabbas and Wolfe (2009) used iterative methods to approximate the eigenvalues, and capture the most important feature of the Gaussian process. Their approach seems to work well for moderate dimensions, but it is unclear what its properties are, and how to tune this method in high dimensions.…”
Section: Introductionmentioning
confidence: 99%
“…This is a well-known fact which has been continually emphasized over the past decade [19,6]. Approaches dealing with this paradigm range from exploiting the sparsity, subsampling of an image or similarity matrix to the low-rank approximation methods such as Nyström algorithm [21,23]. We infer that this is still an open problem from the most recent work by Chen et al in [25] where the team of researchers presents a parallel HPC implementation of SC.…”
Section: Introductionmentioning
confidence: 99%