2020
DOI: 10.48550/arxiv.2012.11140
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

LQF: Linear Quadratic Fine-Tuning

Abstract: Classifiers that are linear in their parameters, and trained by optimizing a convex loss function, have predictable behavior with respect to changes in the training data, initial conditions, and optimization. Such desirable properties are absent in deep neural networks (DNNs), typically trained by non-linear fine-tuning of a pre-trained model. Previous attempts to linearize DNNs have led to interesting theoretical insights, but have not impacted the practice due to the substantial performance gap compared to s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
9
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
2

Relationship

2
0

Authors

Journals

citations
Cited by 2 publications
(10 citation statements)
references
References 19 publications
1
9
0
Order By: Relevance
“…Instead, we seek for analytical expressions of the Wellington Posterior that do not require multiple inference runs. Recent development in network linearization suggest that it is possible to perturb the weights of a trained network locally to perform novel tasks essentially as well as non-linear optimization/fine-tuning [1]. Such linearization, called LQF, is with respect to perturbations of the weights of the model.…”
Section: Analytical Posterior Through Linearizationmentioning
confidence: 99%
See 1 more Smart Citation
“…Instead, we seek for analytical expressions of the Wellington Posterior that do not require multiple inference runs. Recent development in network linearization suggest that it is possible to perturb the weights of a trained network locally to perform novel tasks essentially as well as non-linear optimization/fine-tuning [1]. Such linearization, called LQF, is with respect to perturbations of the weights of the model.…”
Section: Analytical Posterior Through Linearizationmentioning
confidence: 99%
“…The viability of the method we propose to impute uncertainty to a deterministic classifier hinges on the ability to produce an estimate without having to sample multiple inference runs at test time. Recent work on model linearization around a pre-trained point [1] has shown that it is possible to obtain performance comparable to that of full-network non-linear finetuning. In this sense, LQF can be used as a baseline classifier instead of the pre-trained network.…”
Section: (B)mentioning
confidence: 99%
“…The intermediate block in the diagram (which finds the optimal weights w α for the the training loss on D α ) is usually non-differentiable with respect to the dataset, or the derivative is prohibitively expensive to compute. DIVA leverages recent progress in deep learning linearization [1], to derive a closed-form expression for the derivative of the final loss (validation error) with respect to the dataset weights. In particular, [1] have shown that, by replacing cross-entropy with least-squares, replacing ReLu with leaky-ReLu, and performing suitable pre-conditioning, the linearized model performs on par with full non-linear fine-tuning.…”
Section: Introductionmentioning
confidence: 99%
“…DIVA leverages recent progress in deep learning linearization [1], to derive a closed-form expression for the derivative of the final loss (validation error) with respect to the dataset weights. In particular, [1] have shown that, by replacing cross-entropy with least-squares, replacing ReLu with leaky-ReLu, and performing suitable pre-conditioning, the linearized model performs on par with full non-linear fine-tuning. We also leverage a classical result to compute the leave-one-out loss of a linear model in closed-form [61,21].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation