2019
DOI: 10.48550/arxiv.1901.04653
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks using PAC-Bayesian Analysis

Abstract: The notion of flat minima has played a key role in the generalization studies of deep learning models. However, existing definitions of the flatness are known to be sensitive to the rescaling of parameters. The issue suggests that the previous definitions of the flatness might not be a good measure of generalization, because generalization is invariant to such rescalings. In this paper, from the PAC-Bayesian perspective, we scrutinize the discussion concerning the flat minima and introduce the notion of normal… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 9 publications
(10 citation statements)
references
References 10 publications
0
10
0
Order By: Relevance
“…The Hessians lack of reparameterisation invariance [7], has subjected its use for predicting generalisation to criticism [29,38,32]. Normalized definitions of flatness, have been introduced Tsuzuku et al [38], Rangamani et al [32] in a PAC-Bayesian framework, although Rangamani et al [32] note that empirically Hessian based sharpness measures correlate with generalisation.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The Hessians lack of reparameterisation invariance [7], has subjected its use for predicting generalisation to criticism [29,38,32]. Normalized definitions of flatness, have been introduced Tsuzuku et al [38], Rangamani et al [32] in a PAC-Bayesian framework, although Rangamani et al [32] note that empirically Hessian based sharpness measures correlate with generalisation.…”
Section: Related Workmentioning
confidence: 99%
“…The Hessians lack of reparameterisation invariance [7], has subjected its use for predicting generalisation to criticism [29,38,32]. Normalized definitions of flatness, have been introduced Tsuzuku et al [38], Rangamani et al [32] in a PAC-Bayesian framework, although Rangamani et al [32] note that empirically Hessian based sharpness measures correlate with generalisation. Smith and Le [37] argue that although sharpness can be manipulated, the Bayesian log evidence which they approximate as the log determinant of the Hessian i log(λ i /γ) is invariant to such reparameterisation, where γ is the L2 regularisation co-efficient.…”
Section: Related Workmentioning
confidence: 99%
“…Some researchers specifically study flatness itself. They try to measure flatness (Hochreiter and Schmidhuber, 1997;Keskar et al, 2017;Sagun et al, 2017;Yao et al, 2018), rescale flatness (Tsuzuku et al, 2019), and find flatter minima (Hoffer et al, 2017;Chaudhari et al, 2017;He et al, 2019b). But we still lack a quantitative theory that answers why deep learning find flat minima with such a high probability.…”
Section: Introductionmentioning
confidence: 99%
“…The most prominent critique of the sharp-minima hypothesis comes from Dinh et al [2017], who proves that one can increase the sharpness of any given minima by reparametrizing the network. Similar criticism can be found in theoretical PAC-Bayes work [Tsuzuku et al, 2019, Rangamani et al, 2019, Yi et al, 2019, Neyshabur et al, 2017, that only provides experiments for large-vssmall batch sizes where standard sharpness metrics work well in practice. Van Laarhoven [2017] noted how WD would increase the relative size of gradient updates.…”
Section: Discussionmentioning
confidence: 61%