2017
DOI: 10.48550/arxiv.1710.10230
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Not-So-Random Features

Abstract: We propose a principled method for kernel learning, which relies on a Fourier-analytic characterization of translation-invariant or rotation-invariant kernels. Our method produces a sequence of feature maps, iteratively refining the SVM margin. We provide rigorous guarantees for optimality and generalization, interpreting our algorithm as online equilibrium-finding dynamics in a certain two-player min-max game. Evaluations on synthetic and real-world datasets demonstrate scalability and consistent improvements… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(6 citation statements)
references
References 16 publications
0
6
0
Order By: Relevance
“…It is not necessary to sample from the distribution ρ defined by K (Yang, Sindhwanim, and Mahoney 2014;Avron et al 2016;Chang et al 2017;Bullins, Zhang, and Zhang 2017;Liu et al 2020;Li et al 2019b). Given a strictly positive, even reference density ρ : R d → R + (which is associated with a reference kernel K(x, x ) = k(x − x ) via Bochner's theorem), and defining µ( ω) = ρ( ω)/ρ( ω), (2) and (3) become:…”
Section: Background: Random Fourier Featuresmentioning
confidence: 99%
See 2 more Smart Citations
“…It is not necessary to sample from the distribution ρ defined by K (Yang, Sindhwanim, and Mahoney 2014;Avron et al 2016;Chang et al 2017;Bullins, Zhang, and Zhang 2017;Liu et al 2020;Li et al 2019b). Given a strictly positive, even reference density ρ : R d → R + (which is associated with a reference kernel K(x, x ) = k(x − x ) via Bochner's theorem), and defining µ( ω) = ρ( ω)/ρ( ω), (2) and (3) become:…”
Section: Background: Random Fourier Featuresmentioning
confidence: 99%
“…Random Fourier features (RFF) (Rahimi and Recht 2006;Liu et al 2020) was originally developed to tackle the problem of computational complexity. RFF-based methods work by approximating the feature map underlying a kernel using a finite (D-) dimensional map obtained by sampling from a density function -typically, but not necessarily (Chang et al 2017;Liu et al 2020;Bullins, Zhang, and Zhang 2017), the spectral density corresponding to the kernel by Bochner's theorem. Using this map, the problem is re-cast in approximate feature space and solved in primal form, reducing the typical complexity to O(N D 3 ).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…For threshold functions of general SVM classifiers, a "weighted" version of the alignment is optimized [29,30]. The intuition here is that only the data points that constitute the support vectors should determine the kernel parameters.…”
Section: Kernel Alignment For the Fiducial Statementioning
confidence: 99%
“…( 6) with a maximum error of by drawing O( −2 ) samples from p(ω) (see also [33]). This argument applies to rotationally invariant kernels as well [34].…”
Section: Introductionmentioning
confidence: 96%