2019
DOI: 10.48550/arxiv.1912.05671
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Linear Mode Connectivity and the Lottery Ticket Hypothesis

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
24
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 15 publications
(24 citation statements)
references
References 0 publications
0
24
0
Order By: Relevance
“…In other words, we could have trained smaller networks from the start if only we had known which subnetworks to choose. Unfortunately, LTH requires to empirically find these intriguing subnetworks by an iterative pruning [17,18] , which still cannot get rid of the expensiveness of post-training pruning. In view of that, follow-up works reveal that sparsity patterns might emerge at the initialization [19,20], the early stage of training [21,22], or in dynamic forms throughout training [23][24][25] by updating model parameters and architecture typologies simultaneously.…”
Section: Introductionmentioning
confidence: 99%
“…In other words, we could have trained smaller networks from the start if only we had known which subnetworks to choose. Unfortunately, LTH requires to empirically find these intriguing subnetworks by an iterative pruning [17,18] , which still cannot get rid of the expensiveness of post-training pruning. In view of that, follow-up works reveal that sparsity patterns might emerge at the initialization [19,20], the early stage of training [21,22], or in dynamic forms throughout training [23][24][25] by updating model parameters and architecture typologies simultaneously.…”
Section: Introductionmentioning
confidence: 99%
“…However, Draxler et al (2018); Garipov et al (2018) show that local minima found by stochastic gradient descent (SGD) can be connected via piecewise linear paths. Frankle et al (2019) further show that linearly connected solutions may be found if networks share the same initialization. demonstrate the connection between linear connectivity and the advantage nonlinear networks enjoy over their linearized version.…”
Section: Related Workmentioning
confidence: 83%
“…5 Magnitude pruning avoids layer-collapse with conservation and iteration Having demonstrated and investigated the cause of layer-collapse in single-shot pruning methods at initialization, we now explore an iterative pruning method that appears to avoid the issue entirely. Iterative Magnitude Pruning (IMP) is a recently proposed pruning algorithm that has proven to be successful in finding extremely sparse trainable neural networks at initialization (winning lottery tickets) [10,11,12,41,42,43,44]. The algorithm follows three simple steps.…”
Section: Synflow Random Graspmentioning
confidence: 99%