2021
DOI: 10.48550/arxiv.2104.11044
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes

James Lucas,
Juhan Bae,
Michael R. Zhang
et al.

Abstract: Linear interpolation between initial neural network parameters and converged parameters after training with stochastic gradient descent (SGD) typically leads to a monotonic decrease in the training objective. This Monotonic Linear Interpolation (MLI) property, first observed by Goodfellow et al. (2014), persists in spite of the non-convex objectives and highly non-linear training dynamics of neural networks. Extending this work, we evaluate several hypotheses for this property that, to our knowledge, have not … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 32 publications
0
2
0
Order By: Relevance
“…These results received considerable attention in the eld, and the matter is likely to be more complicated. Lucas et al 27 are able to create counterexamples to the MLI property and others have observed different results to Goodfellow et al 26 when revisiting the work on more modern architectures and data sets. 28 Further doubt that 'Machine learning may just be simple' is cast in ref.…”
Section: Motivation: El4mlmentioning
confidence: 99%
“…These results received considerable attention in the eld, and the matter is likely to be more complicated. Lucas et al 27 are able to create counterexamples to the MLI property and others have observed different results to Goodfellow et al 26 when revisiting the work on more modern architectures and data sets. 28 Further doubt that 'Machine learning may just be simple' is cast in ref.…”
Section: Motivation: El4mlmentioning
confidence: 99%
“…In [7] it was observed that the loss decreases monotonically over the line between the initialization and the final convergence points. This observation was shown not to hold when using larger learning rates [21]. Swirszcz et al [31] also show that it is possible to create datasets which lead to a landscape containing local minima.…”
Section: Related Workmentioning
confidence: 99%