LQF: Linear Quadratic Fine-Tuning

Achille, Alessandro; Golatkar, Aditya; Ravichandran, Avinash; Polito, Marzia; Soatto, Stefano

doi:10.48550/arxiv.2012.11140

Cited by 2 publications

(10 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Instead, we seek for analytical expressions of the Wellington Posterior that do not require multiple inference runs. Recent development in network linearization suggest that it is possible to perturb the weights of a trained network locally to perform novel tasks essentially as well as non-linear optimization/fine-tuning [1]. Such linearization, called LQF, is with respect to perturbations of the weights of the model.…”

Section: Analytical Posterior Through Linearizationmentioning

confidence: 99%

“…The viability of the method we propose to impute uncertainty to a deterministic classifier hinges on the ability to produce an estimate without having to sample multiple inference runs at test time. Recent work on model linearization around a pre-trained point [1] has shown that it is possible to obtain performance comparable to that of full-network non-linear finetuning. In this sense, LQF can be used as a baseline classifier instead of the pre-trained network.…”

Section: (B)mentioning

confidence: 99%

See 1 more Smart Citation

Scene Uncertainty and the Wellington Posterior of Deterministic Image Classifiers

Tsuei¹,

Golatkar²,

Soatto³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

We propose a method to estimate the uncertainty of the outcome of an image classifier on a given input datum. Deep neural networks commonly used for image classification are deterministic maps from an input image to an output class. As such, their outcome on a given datum involves no uncertainty, so we must specify what variability we are referring to when defining, measuring and interpreting "confidence." To this end, we introduce the Wellington Posterior, which is the distribution of outcomes that would have been obtained in response to data that could have been generated by the same scene that produced the given image. Since there are infinitely many scenes that could have generated the given image, the Wellington Posterior requires induction from scenes other than the one portrayed. We explore alternate methods using data augmentation, ensembling, and model linearization. Additional alternatives include generative adversarial networks, conditional prior networks, and supervised single-view reconstruction. We test these alternatives against the empirical posterior obtained by inferring the class of temporally adjacent frames in a video. These developments are only a small step towards assessing the reliability of deep network classifiers in a manner that is compatible with safety-critical applications.

show abstract

Section: Analytical Posterior Through Linearizationmentioning

confidence: 99%

Section: (B)mentioning

confidence: 99%

Scene Uncertainty and the Wellington Posterior of Deterministic Image Classifiers

Tsuei¹,

Golatkar²,

Soatto³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…The intermediate block in the diagram (which finds the optimal weights w α for the the training loss on D α ) is usually non-differentiable with respect to the dataset, or the derivative is prohibitively expensive to compute. DIVA leverages recent progress in deep learning linearization [1], to derive a closed-form expression for the derivative of the final loss (validation error) with respect to the dataset weights. In particular, [1] have shown that, by replacing cross-entropy with least-squares, replacing ReLu with leaky-ReLu, and performing suitable pre-conditioning, the linearized model performs on par with full non-linear fine-tuning.…”

Section: Introductionmentioning

confidence: 99%

“…DIVA leverages recent progress in deep learning linearization [1], to derive a closed-form expression for the derivative of the final loss (validation error) with respect to the dataset weights. In particular, [1] have shown that, by replacing cross-entropy with least-squares, replacing ReLu with leaky-ReLu, and performing suitable pre-conditioning, the linearized model performs on par with full non-linear fine-tuning. We also leverage a classical result to compute the leave-one-out loss of a linear model in closed-form [61,21].…”

Section: Introductionmentioning

confidence: 99%

“…Rather than using the full linearization of the model derived by [1], we restrict the gradient to its last layer, cognizant that we are not exploiting the full power of LQF and thereby obtaining only a lower-bound of performance improvement. Despite that restriction, our results show consistent improvements from dataset optimization, at the modest computational cost of a forward pass over the dataset to optimize the importance weights.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

DIVA: Dataset Derivative of a Learning Task

Dukler¹,

Achille²,

Paolini³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

We present a method to compute the derivative of a learning task with respect to a dataset. A learning task is a function from a training set to the validation error, which can be represented by a trained deep neural network (DNN). The "dataset derivative" is a linear operator, computed around the trained model, that informs how perturbations of the weight of each training sample affect the validation error, usually computed on a separate validation dataset. Our method, DIVA (Differentiable Validation) hinges on a closed-form differentiable expression of the leaveone-out cross-validation error around a pre-trained DNN. Such expression constitutes the dataset derivative. DIVA could be used for dataset auto-curation, for example removing samples with faulty annotations, augmenting a dataset with additional relevant samples, or rebalancing. More generally, DIVA can be used to optimize the dataset, along with the parameters of the model, as part of the training process without the need for a separate validation dataset, unlike bi-level optimization methods customary in AutoML. To illustrate the flexibility of DIVA, we report experiments on sample auto-curation tasks such as outlier rejection, dataset extension, and automatic aggregation of multi-modal data.

show abstract

LQF: Linear Quadratic Fine-Tuning

Cited by 2 publications

References 19 publications

Scene Uncertainty and the Wellington Posterior of Deterministic Image Classifiers

Scene Uncertainty and the Wellington Posterior of Deterministic Image Classifiers

DIVA: Dataset Derivative of a Learning Task

Contact Info

Product

Resources

About