2021
DOI: 10.48550/arxiv.2112.03215
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multi-scale Feature Learning Dynamics: Insights for Double Descent

Abstract: A key challenge in building theoretical foundations for deep learning is the complex optimization dynamics of neural networks, resulting from the highdimensional interactions between the large number of network parameters. Such non-trivial dynamics lead to intriguing behaviors such as the phenomenon of "double descent" of the generalization error. The more commonly studied aspect of this phenomenon corresponds to model-wise double descent where the test error exhibits a second descent with increasing model com… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
0
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 25 publications
(51 reference statements)
0
0
0
Order By: Relevance
“…Limitations. This is an exploratory work that does not investigate all possible setups which may affect or lead to DD, such as regularization (see [37,34]), epoch and sample-wise DD (see, [36,4,22,42]). Moreover, we focus on the under-and overparametrized regime without providing quantitative results about the interpolation threshold itself, [13,14,34].…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Limitations. This is an exploratory work that does not investigate all possible setups which may affect or lead to DD, such as regularization (see [37,34]), epoch and sample-wise DD (see, [36,4,22,42]). Moreover, we focus on the under-and overparametrized regime without providing quantitative results about the interpolation threshold itself, [13,14,34].…”
Section: Discussionmentioning
confidence: 99%
“…Over the past few years, significant strides have been made to understand how neural networks generalize in the presence of noise in classification problems (e.g., [31,19,1,20,47]). Remarkably, the DD phenomenon enabled a closer examination of the NN behavior as the number of trainable parameters, the evolution time, and the size of the dataset vary [36,4,22,42]. Subsequently, other works have produced analytical studies of some of these phenomena…”
Section: Related Workmentioning
confidence: 99%