Unique Properties of Flat Minima in Deep Networks

Mulayoff, Rotem; Michaeli, Tomer

doi:10.48550/arxiv.2002.04710

Cited by 3 publications

(2 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our work extends the bulk of literature concerning mathematical characterization of the implicit regularization induced by gradient-based optimization. 5 Existing characterizations focus on different aspects of learning, for example: dynamics of optimization ( [1,25,45,6,28,41,26]); curvature ("flatness") of obtained minima ( [52]); frequency spectrum of learned input-output mappings ( [61]); invariant quantities throughout training ( [24]); and statistical properties imported from data ( [10]). A ubiquitous approach, arguably more prevalent than the aforementioned, is to demonstrate that learned input-output mappings minimize some notion of norm, or analogously, maximize some notion of margin.…”

Section: Related Workmentioning

confidence: 99%

Implicit Regularization in Deep Learning May Not Be Explainable by Norms

Razin,

Cohen

2020

Preprint

View full text Add to dashboard Cite

Mathematically characterizing the implicit regularization induced by gradientbased optimization is a longstanding pursuit in the theory of deep learning. A widespread hope is that a characterization based on minimization of norms may apply, and a standard test-bed for studying this prospect is matrix factorization (matrix completion via linear neural networks). It is an open question whether norms can explain the implicit regularization in matrix factorization. The current paper resolves this open question in the negative, by proving that there exist natural matrix factorization problems on which the implicit regularization drives all norms (and quasi-norms) towards infinity. Our results suggest that, rather than perceiving the implicit regularization via norms, a potentially more useful interpretation is minimization of rank. We demonstrate empirically that this interpretation extends to a certain class of non-linear neural networks, and hypothesize that it may be key to explaining generalization in deep learning.1 A quasi-norm • on a vector space V is a function from V to R ≥0 that satisfies the same axioms as a norm, except for the triangle inequality ∀v1, v2 ∈ V : v1 + v2 ≤ v1 + v2 , which is replaced by the weaker requirement ∃c ≥ 1 s.t. ∀v1, v2 ∈ V : v1 + v2 ≤ c • ( v1 + v2 ).

show abstract

Section: Related Workmentioning

confidence: 99%

Implicit Regularization in Deep Learning May Not Be Explainable by Norms

Razin,

Cohen

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…For the trained NNs on such synthetic data to generalize to real (application) data, the synthetic training set has to include the features embedded in the real data as much as possible (Kouw, 2018). For one, the training dataset (inputs and labels) should be represented by distributions that include the input and expected labels lead the training to non-sharp (flat) local minima for the real data (Mulayoff and Michaeli, 2020). However, this requirement, especially with respect to the input data to the network, is hard to achieve considering the simplified assumptions we use in modeling and simulation.…”

Section: Introductionmentioning

confidence: 99%

MLReal: Bridging the gap between training on synthetic data and real data applications in machine learning

Alkhalifah

Wang

Ovcharenko

2022

Artificial Intelligence in Geosciences

View full text Add to dashboard Cite

MLReal: Bridging the gap between training on synthetic data and real data applications in machine learning

Alkhalifah

Wang

Ovcharenko

2021

82nd EAGE Annual Conference &Amp; Exhibition

View full text Add to dashboard Cite

Among the biggest challenges we face in utilizing neural networks trained on waveform (i.e., seismic, electromagnetic, or ultrasound) data is its application to real data. The requirement for accurate labels often forces us to train our networks using synthetic data, where labels are readily available. However, synthetic data often fail to capture the reality of the field/real experiment, and we end up with poor performance of the trained neural networks (NNs) at the inference stage. This is because synthetic data lack many of the realistic features embedded in real data, including an accurate waveform source signature, realistic noise, and accurate reflectivity. In other words, the real data set is far from being a sample from the distribution of the synthetic training set. Thus, we describe a novel approach to enhance our supervised neural network (NN) training on synthetic data with real data features (domain adaptation). Specifically, for tasks in which the absolute values of the vertical axis (time or depth) of the input section are not crucial to the prediction, like classification, or can be corrected after the prediction, like velocity model building using a well, we suggest a series of linear operations on the input to the network data so that the training and application data have similar distributions. This is accomplished by applying two operations on the input data to the NN, whether the input is from the synthetic or real data subset domain: (1) The crosscorrelation of the input data section (i.e., shot gather, seismic image, etc.) with a fixed-location reference trace from the input data section. (2) The convolution of the resulting data with the mean (or a random sample) of the autocorrelated sections from the other subset domain. In the training stage, the input data are from the synthetic subset domain and the auto-corrected (we crosscorrelate each trace with itself) sections are from the real subset domain, and the random selection of sections from the real data is implemented at every epoch of the training. In the inference/application stage, the input data are from the real subset domain and the mean of the autocorrelated sections are from the synthetic data subset domain. Example applications on passive seismic data for microseismic event source location determination and on active seismic data for predicting low frequencies are used to demonstrate the power of this approach in improving the applicability of our trained NNs to real data.

show abstract

Unique Properties of Flat Minima in Deep Networks

Cited by 3 publications

References 11 publications

Implicit Regularization in Deep Learning May Not Be Explainable by Norms

Implicit Regularization in Deep Learning May Not Be Explainable by Norms

MLReal: Bridging the gap between training on synthetic data and real data applications in machine learning

MLReal: Bridging the gap between training on synthetic data and real data applications in machine learning

Contact Info

Product

Resources

About