2019
DOI: 10.1609/aaai.v33i01.33014229
|View full text |Cite
|
Sign up to set email alerts
|

Learning Adaptive Random Features

Abstract: Distributed learning and random projections are the most common techniques in large scale nonparametric statistical learning. In this paper, we study the generalization properties of kernel ridge regression using both distributed methods and random features. Theoretical analysis shows the combination remarkably reduces computational cost while preserving the optimal generalization accuracy under standard assumptions. In a benign case, O( √ N ) partitions and O( √ N ) random features are sufficient to achieve O… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
6
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 16 publications
(51 reference statements)
0
6
0
Order By: Relevance
“…To overcome the computational and memory bottleneck of KRLS, practical algorithms are developed, including Nyström approach (Rudi, Carratino, and Rosasco 2017;Camoriano et al 2016) and divide-and-conquer (Zhang, Duchi, and Wainwright 2013;Li, Liu, and Wang 2019) of which statistical properties are well studied. Nyström (Rudi, Camoriano, and Rosasco 2015;Camoriano et al 2016) tactfully constructs some small-scale matrices, by sampling the dataset, to approximate the raw kernel matrix so that the time and space complexity can make a sudden drop.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…To overcome the computational and memory bottleneck of KRLS, practical algorithms are developed, including Nyström approach (Rudi, Carratino, and Rosasco 2017;Camoriano et al 2016) and divide-and-conquer (Zhang, Duchi, and Wainwright 2013;Li, Liu, and Wang 2019) of which statistical properties are well studied. Nyström (Rudi, Camoriano, and Rosasco 2015;Camoriano et al 2016) tactfully constructs some small-scale matrices, by sampling the dataset, to approximate the raw kernel matrix so that the time and space complexity can make a sudden drop.…”
Section: Related Workmentioning
confidence: 99%
“…In this section, for exploring the generalization ability, we firstly introduce four standard assumptions, which are widely used in statistical learning of squared loss (Smale and Zhou 2007;Caponnetto and Vito 2007;Rudi, Carratino, and Rosasco 2017;Li, Liu, and Wang 2019). Under the basic assumptions, the theoretical bound of the proposed algorithm is provided, which is the same as that of the exact Kernel Regularized Least Squares (KRLS).…”
Section: Theoretical Assessmentsmentioning
confidence: 99%
See 3 more Smart Citations