2022
DOI: 10.48550/arxiv.2201.11729
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 0 publications
0
1
0
Order By: Relevance
“…The sharp contrast between the so-called kernel and rich regimes (Woodworth et al, 2020) reflects the importance of the initialization scale, where a large initialization often leads to the kernel regime with features barely changing during training (Jacot et al, 2018;Chizat et al, 2018;Du et al, 2018Du et al, , 2019Allen-Zhu et al, 2019b,a;Zou et al, 2020;Arora et al, 2019b;Yang, 2019;Jacot et al, 2021), while with a small initialization, the solution exhibits richer behavior with the resulting model having lower complexity (Gunasekar et al, 2018b,c;Li et al, 2018;Razin and Cohen, 2020;Arora et al, 2019a;Chizat and Bach, 2020;Li et al, 2020;Lyu and Li, 2019;Lyu et al, 2021;Razin et al, 2022;Stöger and Soltanolkotabi, 2021;Ge et al, 2021). Recently Yang and Hu (2021) give a complete characterization on the relationship between initialization scale, parametrization and learning rate in order to avoid kernel regime.…”
Section: Related Workmentioning
confidence: 99%
“…The sharp contrast between the so-called kernel and rich regimes (Woodworth et al, 2020) reflects the importance of the initialization scale, where a large initialization often leads to the kernel regime with features barely changing during training (Jacot et al, 2018;Chizat et al, 2018;Du et al, 2018Du et al, , 2019Allen-Zhu et al, 2019b,a;Zou et al, 2020;Arora et al, 2019b;Yang, 2019;Jacot et al, 2021), while with a small initialization, the solution exhibits richer behavior with the resulting model having lower complexity (Gunasekar et al, 2018b,c;Li et al, 2018;Razin and Cohen, 2020;Arora et al, 2019a;Chizat and Bach, 2020;Li et al, 2020;Lyu and Li, 2019;Lyu et al, 2021;Razin et al, 2022;Stöger and Soltanolkotabi, 2021;Ge et al, 2021). Recently Yang and Hu (2021) give a complete characterization on the relationship between initialization scale, parametrization and learning rate in order to avoid kernel regime.…”
Section: Related Workmentioning
confidence: 99%