2020
DOI: 10.48550/arxiv.2003.02237
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Neural Kernels Without Tangents

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
24
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(24 citation statements)
references
References 0 publications
0
24
0
Order By: Relevance
“…We use batch size 125, train for 140 epochs, and decay the learning rate thrice at epochs 80, 100 and 120 each by a factor 0.2. 8 We use standard random crop, random flip, normalization, and cutout augmentation [77] for the training data.…”
Section: A Experiments Detailsmentioning
confidence: 99%
See 1 more Smart Citation
“…We use batch size 125, train for 140 epochs, and decay the learning rate thrice at epochs 80, 100 and 120 each by a factor 0.2. 8 We use standard random crop, random flip, normalization, and cutout augmentation [77] for the training data.…”
Section: A Experiments Detailsmentioning
confidence: 99%
“…We use batch size 50, train for 200 epochs, and decay the learning rate twice at epochs 140 and 170 each by a factor 0.2. We use ZCA data preprocessing which has been reported very helpful for improving neural kernel methods' performance together with cutout augmentation [77]. 9…”
Section: A Experiments Detailsmentioning
confidence: 99%
“…[5] defined a family of "arc-cosine" kernels to imitate the computations performed by infinitely wide networks in expectation. [4] proposed kernels that are equivalent to expectations of finite-widths random networks. [7] presented exact computations of some kernels, using which the kernel regression models can be shown to be the limit (in widths and training time) of fully-trainable, infinitely wide fully-connected networks trained with gradient descent.…”
Section: Related Work a Connecting Neural Network With Kernel Methodsmentioning
confidence: 99%
“…In fact, if we ensure that there is at least one example from each class in the training data, the modular approach needed as few as 10 randomly chosen examples to achieve 94.88% accuracy, that is, a single randomly chosen example per class. 4 These observations suggest that our modular training method can almost completely rely on weak pairwise labels, which suggests new paradigms for obtaining labeled data that can potentially be less costly than the existing ones.…”
Section: Label Efficiency Of Modular Deep Learningmentioning
confidence: 99%
“…Despite decades of intense mathematical progress, the rigorous analysis of the generalization of kernel methods remains a very active and challenging area of research. In recent years, many new kernels have been introduced for both regression and classification tasks; notably, a large number of kernels have been discovered in the context of deep learning, in particular through the so-called Scattering Transform [20], and in close connection with deep neural networks [7,15], yielding ever-improving performance for various practical tasks [1,10,16,25]. Currently, theoretical tools to select the relevant kernel for a given task, i.e.…”
Section: Introductionmentioning
confidence: 99%