2021
DOI: 10.48550/arxiv.2110.06296
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks

Abstract: In this paper, we conjecture that if the permutation invariance of neural networks is taken into account, SGD solutions will likely have no barrier in the linear interpolation between them. Although it is a bold conjecture, we show how extensive empirical attempts fall short of refuting it. We further provide a preliminary theoretical result to support our conjecture. Our conjecture has implications for lottery ticket hypothesis, distributed training and ensemble methods.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
13
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 10 publications
(17 citation statements)
references
References 17 publications
2
13
0
Order By: Relevance
“…This result suggests that generalization, or at least high performance, is closely tied to the linear mode connectivity of the models in question. This implied result is further supported by Entezari et al [7], which found a larger barrier between models when they exhibited higher test error. Neyshabur et al [35] even described linear mode connectivity as a crucial component of transfer learning, finding that finetuned models initialized from the same pretrained model will be in the same linearly connected basin, in contrast to models trained from scratch, which exhibit barriers even when initialized from the same random weights.…”
Section: Linear Mode Connectivitysupporting
confidence: 75%
See 4 more Smart Citations
“…This result suggests that generalization, or at least high performance, is closely tied to the linear mode connectivity of the models in question. This implied result is further supported by Entezari et al [7], which found a larger barrier between models when they exhibited higher test error. Neyshabur et al [35] even described linear mode connectivity as a crucial component of transfer learning, finding that finetuned models initialized from the same pretrained model will be in the same linearly connected basin, in contrast to models trained from scratch, which exhibit barriers even when initialized from the same random weights.…”
Section: Linear Mode Connectivitysupporting
confidence: 75%
“…This behavior characterizes the two basins: the larger basin is syntax-aware (tending to acquire heuristics that require awareness of constituent structure), while the smaller basin is syntax-unaware (acquiring heuristics that rely only on unordered sets of words). 7 Distributions of the clusters: We find (Fig. 5) that CG-based cluster membership accounts for some of the heavy tail of performance on HANS-LO, supporting the claim that the convex basins on the loss surface differentiate generalization strategies.…”
Section: Clusteringsupporting
confidence: 67%
See 3 more Smart Citations