Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes

Lucas, James; Bae, Juhan; Zhang, Michael R.; Fort, Stanislav; Zemel, Richard; Grosse, Roger

doi:10.48550/arxiv.2104.11044

Search citation statements

Order By: Relevance

Paper Sections

Select...

Motivation: El4ml1

Related Work1

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2022

2024

Publication Types

Select...

Article1

Other1

Relationship

Self Cite0

Independent2

Authors

Journals

Cited by 2 publications

(2 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These results received considerable attention in the eld, and the matter is likely to be more complicated. Lucas et al 27 are able to create counterexamples to the MLI property and others have observed different results to Goodfellow et al 26 when revisiting the work on more modern architectures and data sets. 28 Further doubt that 'Machine learning may just be simple' is cast in ref.…”

Section: Motivation: El4mlmentioning

confidence: 99%

Insights into machine learning models from chemical physics: an energy landscapes approach (EL for ML)

Niroomand,

Dicks,

Pyzer-Knapp

et al. 2024

Digital Discovery

View full text Add to dashboard Cite

show abstract

Section: Motivation: El4mlmentioning

confidence: 99%

Insights into machine learning models from chemical physics: an energy landscapes approach (EL for ML)

Niroomand,

Dicks,

Pyzer-Knapp

et al. 2024

Digital Discovery

View full text Add to dashboard Cite

show abstract

“…In [7] it was observed that the loss decreases monotonically over the line between the initialization and the final convergence points. This observation was shown not to hold when using larger learning rates [21]. Swirszcz et al [31] also show that it is possible to create datasets which lead to a landscape containing local minima.…”

Section: Related Workmentioning

confidence: 99%

On Avoiding Local Minima Using Gradient Descent With Large Learning Rates

Mohtashami¹,

Jäggi²,

Stich³

2022

Preprint

View full text Add to dashboard Cite

It has been widely observed in training of neural networks that when applying gradient descent (GD), a large step size is essential for obtaining superior models. However, the effect of large step sizes on the success of GD is not well understood theoretically. We argue that a complete understanding of the mechanics leading to GD's success may indeed require considering effects of using a large step size. To support this claim, we prove on a certain class of functions that GD with large step size follows a different trajectory than GD with a small step size, leading to convergence to the global minimum. We also demonstrate the difference in trajectories for small and large learning rates when GD is applied on a neural network, observing effects of an escape from a local minimum with a large step size, which shows this behavior is indeed relevant in practice. Finally, through a novel set of experiments, we show even though stochastic noise is beneficial, it is not enough to explain success of SGD and a large learning rate is essential for obtaining the best performance even in stochastic settings.

show abstract

Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes

Cited by 2 publications

References 32 publications

Insights into machine learning models from chemical physics: an energy landscapes approach (EL for ML)

Insights into machine learning models from chemical physics: an energy landscapes approach (EL for ML)

On Avoiding Local Minima Using Gradient Descent With Large Learning Rates

Contact Info

Product

Resources

About