2019
DOI: 10.48550/arxiv.1906.03593
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Quadratic Suffices for Over-parametrization via Matrix Chernoff Bound

Abstract: We improve the over-parametrization size over two beautiful results [Li and Liang' 2018] and [Du, Zhai, Poczos and Singh' 2019] in deep learning theory.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
54
2

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 36 publications
(56 citation statements)
references
References 17 publications
0
54
2
Order By: Relevance
“…We can relate the training and generalization behavior of dense and sparse models through their NTK. The standard result [Song and Yang, 2019] implies the following.…”
Section: F Neural Tangent Kernel Convergence and Generalizationmentioning
confidence: 84%
See 1 more Smart Citation
“…We can relate the training and generalization behavior of dense and sparse models through their NTK. The standard result [Song and Yang, 2019] implies the following.…”
Section: F Neural Tangent Kernel Convergence and Generalizationmentioning
confidence: 84%
“…We build on the great literature of NTK [Li and Liang, 2018, Du et al, 2019, Allen-Zhu et al, 2019b. The standard result [Song and Yang, 2019] implies the following, if the NTK of the sparse model is close to the NTK of the dense model, then (i) their training convergence speed is similar, (ii) their generalization bounds are similar. For completeness, we state the formal result in Appendix F.…”
Section: Convergence and Generalization Of Sparse Networkmentioning
confidence: 99%
“…There has been a deluge of works on the Neural Tangent Kernel since it was introduced by Jacot et al (2018), and thus we do our best to provide a partial list. Global convergence guarantees for the optimization, and to a lesser extent generalization, for networks polynomially wide in the number of training samples n and other parameters has been addressed in several works (Du et al, 2019b;Oymak & Soltanolkotabi, 2020;Du et al, 2019a;Allen-Zhu et al, 2019a,b;Zou et al, 2020;Zou & Gu, 2019;Song & Yang, 2020;Arora et al, 2019). To our knowledge, for the regression problem with arbitrary labels, quadratic overparameterization m n 2 is state-of-the art (Oymak & Soltanolkotabi, 2020;Song & Yang, 2020;Nguyen & Mondelli, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…Global convergence guarantees for the optimization, and to a lesser extent generalization, for networks polynomially wide in the number of training samples n and other parameters has been addressed in several works (Du et al, 2019b;Oymak & Soltanolkotabi, 2020;Du et al, 2019a;Allen-Zhu et al, 2019a,b;Zou et al, 2020;Zou & Gu, 2019;Song & Yang, 2020;Arora et al, 2019). To our knowledge, for the regression problem with arbitrary labels, quadratic overparameterization m n 2 is state-of-the art (Oymak & Soltanolkotabi, 2020;Song & Yang, 2020;Nguyen & Mondelli, 2020). E et al (2020) gave a fairly comprehensive study of optimization and generalization of shallow networks trained under the standard parameterization.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation