Improved Guarantees and a Multiple-descent Curve for Column Subset Selection and the Nystrom Method (Extended Abstract)

Dereziński, Michał; Khanna, Rajiv; Mahoney, Michael W.

doi:10.24963/ijcai.2021/647

Cited by 12 publications

(21 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similarly, the recent advances in using and understanding the Nyström method (Williams and Seeger, 2001), which is one of the most popular sparse approximations in kernel methods, have been made independently to those of sparse GP approximations. The majority of these advances focus on an efficient approximation of the kernel matrix (e.g., Drineas and Mahoney, 2005;Belabbas and Wolfe, 2009;Gittens and Mahoney, 2016;Derezinski et al, 2020) or empirical risk minimization in the RKHS with a reduced basis (e.g, Bach, 2013;El Alaoui and Mahoney, 2015;Rudi et al, 2015Rudi et al, , 2017Meanti et al, 2020). This separation of two lines of research are arguably due to the difference in the notations and modeling philosophies of GPs and kernel methods.…”

Section: Introductionmentioning

confidence: 99%

Connections and Equivalences between the Nyström Method and Sparse Variational Gaussian Processes

Wild¹,

Kanagawa²,

Sejdinović³

2021

Preprint

View full text Add to dashboard Cite

We investigate the connections between sparse approximation methods for making kernel methods and Gaussian processes (GPs) scalable to massive data, focusing on the Nyström method and the Sparse Variational Gaussian Processes (SVGP). While sparse approximation methods for GPs and kernel methods share some algebraic similarities, the literature lacks a deep understanding of how and why they are related. This is a possible obstacle for the communications between the GP and kernel communities, making it difficult to transfer results from one side to the other. Our motivation is to remove this possible obstacle, by clarifying the connections between the sparse approximations for GPs and kernel methods. In this work, we study the two popular approaches, the Nyström and SVGP approximations, in the context of a regression problem, and establish various connections and equivalences between them. In particular, we provide an RKHS interpretation of the SVGP approximation, and show that the Evidence Lower Bound of the SVGP contains the objective function of the Nyström approximation, revealing the origin of the algebraic equivalence between the two approaches. We also study recently established convergence results for the SVGP and how they are related to the approximation quality of the Nyström method.

show abstract

Section: Introductionmentioning

confidence: 99%

Connections and Equivalences between the Nyström Method and Sparse Variational Gaussian Processes

Wild¹,

Kanagawa²,

Sejdinović³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The initial Nyström samples  (0) we considered were draw uniformly at random without replacement; while our experiments suggest that the local minima of the radial SKD often induce approximations of comparable quality, the use of more efficient initialisation strategies may be investigated (see e.g. [3,4,11,13,18]).…”

Section: Discussionmentioning

confidence: 99%

“…For a Nyström sample  ∈ of size ∈ ℕ, the matrix ̂ () is of rank at most . Following [4,10], to further assess the efficiency of the approximation of induced by , we introduce the approximation factors…”

Section: Numerical Experimentsmentioning

confidence: 99%

“…In Data Science, the Nyström method refers to a specific technique for the low-rank approximation of symmetric positive-semidefinite (SPSD) matrices; see e.g. [4,5,10,11,18]. Given an × SPSD matrix , with ∈ ℕ, the Nyström method consists of selecting a sample of ∈ ℕ columns of , generally with ≪ , and next defining a low-rank approximation ̂ of based on this sample of columns.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Local optimisation of Nyström samples through stochastic gradient descent

Hutchings¹,

Gauthier²

2022

Preprint

View full text Add to dashboard Cite

We study a relaxed version of the column-sampling problem for the Nyström approximation of kernel matrices, where approximations are defined from multisets of landmark points in the ambient space; such multisets are referred to as Nyström samples. We consider an unweighted variation of the radial squared-kernel discrepancy (SKD) criterion as a surrogate for the classical criteria used to assess the Nyström approximation accuracy; in this setting, we discuss how Nyström samples can be efficiently optimised through stochastic gradient descent. We perform numerical experiments which demonstrate that the local minimisation of the radial SKD yields Nyström samples with improved Nyström approximation accuracy.

show abstract

“…Kumar et al (2012) explored the sampling approach for the column subset selection problem by the Nyström method. Derezinski et al (2020) recently provided an improved theoretical guarantee for low-rank approximations of large datasets. Another popular idea in machine learning is coreset, which constructs estimators based on sub-data.…”

Section: Introductionmentioning

confidence: 99%

Optimal Subsampling for Large Sample Ridge Regression

Chen¹,

Zhang²

2022

Preprint

View full text Add to dashboard Cite

Subsampling is a popular approach to alleviating the computational burden for analyzing massive datasets. Recent efforts have been devoted to various statistical models without explicit regularization. In this paper, we develop an efficient subsampling procedure for the large sample linear ridge regression. In contrast to the ordinary least square estimator, the introduction of the ridge penalty leads to a subtle trade-off between bias and variance. We first investigate the asymptotic properties of the subsampling estimator and then propose to minimize the asymptotic-mean-squared-error criterion for optimality. The resulting subsampling probability involves both ridge leverage score and ℓ 2 norm of the predictor. To further reduce the computational cost for calculating the ridge leverage scores, we propose the algorithm with efficient approximation. We show by synthetic and real datasets that the algorithm is both statistically accurate and computationally efficient compared with existing subsampling based methods.

show abstract

Improved Guarantees and a Multiple-descent Curve for Column Subset Selection and the Nystrom Method (Extended Abstract)

Cited by 12 publications

References 32 publications

Connections and Equivalences between the Nyström Method and Sparse Variational Gaussian Processes

Connections and Equivalences between the Nyström Method and Sparse Variational Gaussian Processes

Local optimisation of Nyström samples through stochastic gradient descent

Optimal Subsampling for Large Sample Ridge Regression

Contact Info

Product

Resources

About