The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2021
DOI: 10.48550/arxiv.2108.12006
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

When and how epochwise double descent happens

Cory Stephenson,
Tyler Lee

Abstract: Deep neural networks are known to exhibit a 'double descent' behavior as the number of parameters increases. Recently, it has also been shown that an 'epochwise double descent' effect exists in which the generalization error initially drops, then rises, and finally drops again with increasing training time. This presents a practical problem in that the amount of time required for training is long, and early stopping based on validation performance may result in suboptimal generalization. In this work we develo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

1
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 15 publications
1
1
0
Order By: Relevance
“…Our findings and those of Heckel & Yilmaz (2020) and Stephenson & Lee (2021) reinforce one another with a common central finding that the epoch-wise double descent results from different features/layers being learned at different time-scales. However, we also highlight that both Heckel & Yilmaz (2020) and Stephenson & Lee (2021) use tools from random matrix theory to study distinct data models from our teacher-student setup. We study a similar phenomenon by leveraging the replica method from statistical physics to characterize the generalization behavior using a set of informative macroscopic parameters.…”
Section: Related Work and Discussionsupporting
confidence: 89%
See 1 more Smart Citation
“…Our findings and those of Heckel & Yilmaz (2020) and Stephenson & Lee (2021) reinforce one another with a common central finding that the epoch-wise double descent results from different features/layers being learned at different time-scales. However, we also highlight that both Heckel & Yilmaz (2020) and Stephenson & Lee (2021) use tools from random matrix theory to study distinct data models from our teacher-student setup. We study a similar phenomenon by leveraging the replica method from statistical physics to characterize the generalization behavior using a set of informative macroscopic parameters.…”
Section: Related Work and Discussionsupporting
confidence: 89%
“…In recent years, there has been an interest in studying the non-asymptotic (finite training time) performance (e.g. Saxe et al, 2013;Advani & Saxe, 2017;Nakkiran et al, 2019b;Pezeshki et al, 2020a;Stephenson & Lee, 2021). Among the limited work studying the particular epoch-wise double descent, Nakkiran et al (2019a) introduces the notion of effective model complexity and hypothesizes that it increases with training time and hence unifies both model-wise and epoch-wise double descent.…”
Section: Introductionmentioning
confidence: 99%