2020
DOI: 10.48550/arxiv.2011.13772
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Gradient Descent for Deep Matrix Factorization: Dynamics and Implicit Bias towards Low Rank

Abstract: We provide an explicit analysis of the dynamics of vanilla gradient descent for deep matrix factorization in a setting where the minimizer of the loss function is unique. We show that the recovery rate of ground-truth eigenvectors is proportional to the magnitude of the corresponding eigenvalues and that the differences among the rates are amplified as the depth of the factorization increases. For exactly characterized time intervals, the effective rank of gradient descent iterates is provably close to the eff… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(9 citation statements)
references
References 18 publications
(55 reference statements)
0
9
0
Order By: Relevance
“…For t = t ⋆ , we first note that inequalities ( 54), (55), and ( 56) follow directly from our assumptions. In order to prove inequality (53) we note that…”
Section: Analysis Of the Spectral Phasementioning
confidence: 93%
See 1 more Smart Citation
“…For t = t ⋆ , we first note that inequalities ( 54), (55), and ( 56) follow directly from our assumptions. In order to prove inequality (53) we note that…”
Section: Analysis Of the Spectral Phasementioning
confidence: 93%
“…Linear neural networks: In [51,52,53,54,55] the convergence of gradient flow and gradient descent is studied for (deep) linear neural networks of the form min…”
Section: Related Workmentioning
confidence: 99%
“…The gradient descent training takes the linearly fully-connected networks to solutions with implicit regularization of max-margin (Soudry et al, 2018) while linear convolutional networks to linear solutions with another penalty in the frequency domain (Gunasekar et al, 2018a). Deep matrix factorization by deep linear networks with gradient descent induces nuclear norm minimization of the learned matrix, leading to an implicit low-rank regularization (Gunasekar et al, 2018b;Arora et al, 2019;Chou et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…Since we observe convergence in plot 1c, this suggests that the bound of Theorem 2.4 may not be entirely sharp. But increasing the step size beyond a certain value leads to divergence as suggested by plot 2d, so that some bound on the step size is necessary (see also [7,Lemma A.1] for a necessary condition in a special case). In our second set of experiments we use a sequence of step sizes η k that converges to zero at various speeds.…”
Section: Numerical Experimentsmentioning
confidence: 99%
“…As a result, one possible explanation of the phenomenon of the good generalization property of overparameterized learned neural networks is that the implicit bias of (stochastic) gradient descent is towards to solutions of low complexity in a suitable sense, resulting in good generalization. While a theoretical analysis of this phenomenon seems difficult for nonlinear networks, first works for linear networks indicate that gradient descent leads to linear networks (factorized matrices) of low rank [2,7,10,11,16], although many open questions remain. Another important role seems to be played by the random initialization, see e.g.…”
Section: Introductionmentioning
confidence: 99%