Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective

Somepalli, Gowthami; Fowl, Liam; Bansal, Ankit; Yeh-Chiang, Ping; Dar, Yehuda; Baraniuk, Richard G.; Goldblum, Micah; Goldstein, Tom

doi:10.48550/arxiv.2203.08124

Cited by 3 publications

(5 citation statements)

References 29 publications

(42 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Transfer learning priors range from the high entropy distribution provided by training from scratch to those pretrained models which seem to consistently select a single basin [35]. Possible influences on basin selection, and therefore on generalization strategies, may include length of pretraining [49], data scheduling, and architecture selection [43]. The strength of a prior towards particular basins may be not only linked to training procedure, but also strongly related to the availability of features in the pretrained representations [29,16,42].…”

Section: Discussion and Future Workmentioning

confidence: 99%

“…However, variation on performance in diagnostic sets is even more substantial, from social biases [40] to unusual paraphrases [31,56]. Benton et al [1] found that a wide variety of decision boundaries were expressed within a low-loss volume, and Somepalli et al [43] further found that there is diversity in boundaries during OOD generalization, far off the data manifold. Our work contributes to diversity in generalization by linking sets of models that share low dimensional subspaces to particular OOD generalization behavior.…”

Section: Related Workmentioning

confidence: 99%

“…Although models trained with similar procedures tend to exhibit similar indomain (ID) performance on these tasks, they exhibit diverse decision boundaries [1]. In particular, models with similar performance can make significantly different judgments when presented with examples that fall far from the training data manifold [43]. In computer vision, these discussions of generalization often focus on visualizations of decision boundaries learnt by the different models.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Linear Connectivity Reveals Generalization Strategies

Juneja¹,

Bansal²,

Cho³

et al. 2022

Preprint

View full text Add to dashboard Cite

It is widely accepted in the mode connectivity literature that when two neural networks are trained similarly on the same data, they are connected by a path through parameter space over which test set accuracy is maintained. Under some circumstances, including transfer learning from pretrained models, these paths are presumed to be linear. In contrast to existing results, we find that among text classifiers (trained on MNLI, QQP, and CoLA), some pairs of finetuned models have large barriers of increasing loss on the linear paths between them. On each task, we find distinct clusters of models which are linearly connected on the test loss surface, but are disconnected from models outside the cluster-models that occupy separate basins on the surface. By measuring performance on specially-crafted diagnostic datasets, we find that these clusters correspond to different generalization strategies: one cluster behaves like a bag of words model under domain shift, while another cluster uses syntactic heuristics. Our work demonstrates how the geometry of the loss surface can guide models towards different heuristic functions.Preprint. Under review.

show abstract

Section: Discussion and Future Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Linear Connectivity Reveals Generalization Strategies

Juneja¹,

Bansal²,

Cho³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Using three sample images as a base "triplet", the region of the decision space that lies between the triplet samples can be visualized. Vectors representing the positions of two of the triplet samples in relation to the third are used to construct a vector space, which is used to facilitate interpolation of the sample triplet, creating a vicinal distribution [8]. A "virtual" data set composed of images is sampled from the vicinal distribution.…”

Section: Decision Regions and Boundariesmentioning

confidence: 99%

Decision region analysis to deconstruct the subgroup influence on AI/ML predictions

Burgon¹,

Petrick²,

Sahiner³

et al. 2023

Medical Imaging 2023: Computer-Aided Diagnosis

View full text Add to dashboard Cite

Assessing the generalizability of deep learning algorithms based on the size and diversity of the training data is not trivial. This study uses the mapping of samples in the image data space to the decision regions in the prediction space to understand how different subgroups in the data impact the neural network learning process and affect model generalizability. Using vicinal distribution-based linear interpolation, a plane of the decision region space spanned by the random 'triplet' of three images can be constructed. Analyzing these decision regions for many random triplets can provide insight into the relationships between distinct subgroups. In this study, a contrastive self-supervised approach is used to develop a 'base' classification model trained on a large chest x-ray (CXR) dataset. The base model is fine-tuned on COVID-19 CXR data to predict image acquisition technology (computed radiography (CR) or digital radiography (DX) and patient sex (male (M) or female (F)). Decision region analysis shows that the model's image acquisition technology decision space is dominated by CR, regardless of the acquisition technology for the base images. Similarly, the Female class dominates the decision space. This study shows that decision region analysis has the potential to provide insights into subgroup diversity, sources of imbalances in the data, and model generalizability.

show abstract

“…The width of the T can be interpreted as the inductive bias of the learning algorithm. The decision boundaries of neural networks usually lie on the manifold of the data [25]; and the network behaves more smoothly off the data manifold. A natural consequence of this is that the head of the Ts will be large.…”

Section: Inductive Bias Can Hurt Robustness Even Furthermentioning

confidence: 99%

A law of adversarial risk, interpolation, and label noise

Daniel¹,

Sanyal²

2022

Preprint

View full text Add to dashboard Cite

In supervised learning, it has been shown that label noise in the data can be interpolated without penalties on test accuracy under many circumstances. We show that interpolating label noise induces adversarial vulnerability, and prove the first theorem showing the dependence of label noise and adversarial risk in terms of the data distribution. Our results are almost sharp without accounting for the inductive bias of the learning algorithm. We also show that inductive bias makes the effect of label noise much stronger.

show abstract

Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective

Cited by 3 publications

References 29 publications

Linear Connectivity Reveals Generalization Strategies

Linear Connectivity Reveals Generalization Strategies

Decision region analysis to deconstruct the subgroup influence on AI/ML predictions

A law of adversarial risk, interpolation, and label noise

Contact Info

Product

Resources

About