The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks

Entezari, Rahim; Sedghi, Hanie; Saukh, Olga; Neyshabur, Behnam

doi:10.48550/arxiv.2110.06296

Cited by 10 publications

(17 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This result suggests that generalization, or at least high performance, is closely tied to the linear mode connectivity of the models in question. This implied result is further supported by Entezari et al [7], which found a larger barrier between models when they exhibited higher test error. Neyshabur et al [35] even described linear mode connectivity as a crucial component of transfer learning, finding that finetuned models initialized from the same pretrained model will be in the same linearly connected basin, in contrast to models trained from scratch, which exhibit barriers even when initialized from the same random weights.…”

Section: Linear Mode Connectivitysupporting

confidence: 75%

“…This behavior characterizes the two basins: the larger basin is syntax-aware (tending to acquire heuristics that require awareness of constituent structure), while the smaller basin is syntax-unaware (acquiring heuristics that rely only on unordered sets of words). 7 Distributions of the clusters: We find (Fig. 5) that CG-based cluster membership accounts for some of the heavy tail of performance on HANS-LO, supporting the claim that the convex basins on the loss surface differentiate generalization strategies.…”

Section: Clusteringsupporting

confidence: 67%

“…Inspired by Entezari et al [7], we define our notion of a basin in order to formalize types of behavior during linear interpolation. In contrast to their work, however, our goal is not to identify a largely stable region (the bottom of a basin).…”

Section: Convex Basinsmentioning

confidence: 99%

“…Entezari et al [7] define a barrier's height on a linear path from θ 1 to θ 2 as: We define the convexity gap on a linear path from θ 1 to θ 2 as the maximum possible barrier height on any sub-segment of the linear path joining θ 1 and θ 2 . Mathematically,…”

Section: The Convexity Gapmentioning

confidence: 99%

“…Garipov et al [10] and Fort and Jastrzebski [8] conceptualized these paths as high dimensional volumes connecting entire sets of solutions. Goodfellow et al [12] explored linear connections between model pairs by interpolating between them, while [7] suggested that in general, models that seem linearly disconnected may be connected after permuting their neurons. Our work explores models that are linearly connected without permutation, and is most closely linked with Neyshabur et al [35], which found that models initialized from a shared pretrained checkpoint are linearly connected.…”

Section: Related Workmentioning

confidence: 99%

See 4 more Smart Citations

Linear Connectivity Reveals Generalization Strategies

Juneja¹,

Bansal²,

Cho³

et al. 2022

Preprint

View full text Add to dashboard Cite

It is widely accepted in the mode connectivity literature that when two neural networks are trained similarly on the same data, they are connected by a path through parameter space over which test set accuracy is maintained. Under some circumstances, including transfer learning from pretrained models, these paths are presumed to be linear. In contrast to existing results, we find that among text classifiers (trained on MNLI, QQP, and CoLA), some pairs of finetuned models have large barriers of increasing loss on the linear paths between them. On each task, we find distinct clusters of models which are linearly connected on the test loss surface, but are disconnected from models outside the cluster-models that occupy separate basins on the surface. By measuring performance on specially-crafted diagnostic datasets, we find that these clusters correspond to different generalization strategies: one cluster behaves like a bag of words model under domain shift, while another cluster uses syntactic heuristics. Our work demonstrates how the geometry of the loss surface can guide models towards different heuristic functions.Preprint. Under review.

show abstract

Section: Linear Mode Connectivitysupporting

confidence: 75%

Section: Clusteringsupporting

confidence: 67%

Section: Convex Basinsmentioning

confidence: 99%

Section: The Convexity Gapmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 3 more Smart Citations

Linear Connectivity Reveals Generalization Strategies

Juneja¹,

Bansal²,

Cho³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

Model Fusion via Neuron Transplantation

Öz,

Kiefer,

Debus

et al. 2024

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Unification of symmetries inside neural networks: transformer, feedforward and neural ODE

Hashimoto,

Hirono,

Sannai

2024

Mach. Learn.: Sci. Technol.

View full text Add to dashboard Cite

Understanding the inner workings of neural networks, including transformers, remains one of the most challenging puzzles in machine learning. This study introduces a novel approach by applying the principles of gauge symmetries, a key concept in physics, to neural network architectures. By regarding model functions as physical observables, we find that parametric redundancies of various machine learning models can be interpreted as gauge symmetries. We mathematically formulate the parametric redundancies in neural ODEs, and find that their gauge symmetries are given by spacetime diffeomorphisms, which play a fundamental role in Einstein’s theory of gravity. Viewing neural ODEs as a continuum version of feedforward neural networks, we show that the parametric redundancies in feedforward neural networks are indeed lifted to diffeomorphisms in neural ODEs. We further extend our analysis to transformer models, finding natural correspondences with neural ODEs and their gauge symmetries. The concept of gauge symmetries sheds light on the complex behavior of deep learning models through physics and provides us with a unifying perspective for analyzing various machine learning architectures.

show abstract

The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks

Cited by 10 publications

References 17 publications

Linear Connectivity Reveals Generalization Strategies

Linear Connectivity Reveals Generalization Strategies

Model Fusion via Neuron Transplantation

Unification of symmetries inside neural networks: transformer, feedforward and neural ODE

Contact Info

Product

Resources

About