Small random initialization is akin to spectral learning: Optimization and generalization guarantees for overparameterized low-rank matrix reconstruction

Stöger, Dominik; Soltanolkotabi, Mahdi

doi:10.48550/arxiv.2106.15013

Cited by 2 publications

(8 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similar phenomena have been empirically observed in many other nonconvex problems, where vanilla gradient descent, when coupled with small random initialization (SRI) and early stopping (ES), has good generalization performance even with overpametrization due to the algorithmic regularization effect of SRI annd ES [Woodworth et al, 2020, Ghorbani et al, 2020, Prechelt, 1998, Wang et al, 2021, Li et al, 2018, Stöger and Soltanolkotabi, 2021. This motivates us to study the following question: What is the general behavior of the gradient descent dynamic (GD-M) coupled with SRI and ES?…”

Section: Introductionsupporting

confidence: 57%

See 1 more Smart Citation

Algorithmic Regularization in Model-free Overparametrized Asymmetric Matrix Factorization

Jiang¹,

Chen²,

Ding³

2022

Preprint

View full text Add to dashboard Cite

We study the asymmetric matrix factorization problem under a natural nonconvex formulation with arbitrary overparamatrization. We consider the model-free setting with no further assumption on the rank or singular values of the observed matrix, where the global optima provably overfit. We show that vanilla gradient descent with small random initialization and early stopping produces the best low-rank approximation of the observed matrix, without any additional regularization. We provide a sharp analysis on relationship between the iteration complexity, initialization size, stepsize and final error. In particular, our complexity bound is almost dimension-free and depends logarithmically on the final error, and our results have lenient requirements on the stepsize and initialization. Our bounds improve upon existing work and show good agreement with numerical experiments.

show abstract

Section: Introductionsupporting

confidence: 57%

“…• Model-free setting. Most of existing analysis considers the setting where X is (exactly or approximately) low-rank with a sufficiently large singular value gap δ [Li et al, 2018, Zhuo et al, 2021, Ye and Du, 2021, Fan et al, 2020, Stöger and Soltanolkotabi, 2021.…”

Section: Iteration Complexity and Stepsizementioning

confidence: 99%

Algorithmic Regularization in Model-free Overparametrized Asymmetric Matrix Factorization

Jiang¹,

Chen²,

Ding³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Based on our simulations, we observed that SubGM with small random initialization behaves almost the same as SubGM with spectral initialization. Therefore, we conjecture that small random initialization followed by a few iterations of SubGM is in fact equivalent to spectral initialization; a similar result has been recently proven by Stöger and Soltanolkotabi [31] for gradient descent on 2 -loss. We consider a rigorous verification of this conjecture as an enticing challenge for future research.…”

Section: Discussionsupporting

confidence: 77%

“…Therefore, the desirable performance of SubGM with small initialization can be attributed to its implicit regularization property. In particular, we show that small initialization of SubGM is akin to implicitly regularizing the redundant rank of the over-parameterized model, thereby avoiding overfitting; a recent work [31] has shown a similar property for the gradient descent algorithm on the noiseless matrix recovery with 2 -loss.…”

Section: Power Of Small Initializationmentioning

confidence: 70%

“…It has been recently shown that the trajectories picked up by gradient-based algorithms benefit from implicit regularization [17], or behave non-monotonically over short timescales, yet consistently improve over long timescales [9]. In the context of overparameterized low-rank matrix recovery with 2 -loss, Li et al [22] and Stöger and Soltanolkotabi [31] use trajectory analysis to show that GD with small initialization can recover the ground truth, provided that the measurements are noiseless. Zhuo et al [42] extend this result to the noisy setting, showing that GD converges to a minimax optimal solution at a sublinear rate, and with a number of samples that scale with the search rank.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Global Convergence of Sub-gradient Method for Robust Matrix Recovery: Small Initialization, Noisy Measurements, and Over-parameterization

Ma¹,

Fattahi²

2022

Preprint

View full text Add to dashboard Cite

In this work, we study the performance of sub-gradient method (SubGM) on a natural nonconvex and nonsmooth formulation of low-rank matrix recovery with 1 -loss, where the goal is to recover a low-rank matrix from a limited number of measurements, a subset of which may be grossly corrupted with noise. We study a scenario where the rank of the true solution is unknown and over-estimated instead. The over-estimation of the rank gives rise to an over-parameterized model in which there are more degrees of freedom than needed. Such over-parameterization may lead to overfitting, or adversely affect the performance of the algorithm. We prove that a simple SubGM with small initialization is agnostic to both over-parameterization and noise in the measurements. In particular, we show that small initialization nullifies the effect of over-parameterization on the performance of SubGM, leading to an exponential improvement in its convergence rate. Moreover, we provide the first unifying framework for analyzing the behavior of SubGM under both outlier and Gaussian noise models, showing that SubGM converges to the true solution, even under arbitrarily large and arbitrarily dense noise values, and-perhaps surprisingly-even if the globally optimal solutions do not correspond to the ground truth. At the core of our results is a robust variant of restricted isometry property, called Sign-RIP, which controls the deviation of the sub-differential of the 1 -loss from that of an ideal, expected loss. As a byproduct of our results, we consider a subclass of robust low-rank matrix recovery with Gaussian measurements, and show that the number of required samples to guarantee the global convergence of SubGM is independent of the over-parameterized rank.

show abstract

Small random initialization is akin to spectral learning: Optimization and generalization guarantees for overparameterized low-rank matrix reconstruction

Cited by 2 publications

References 33 publications

Algorithmic Regularization in Model-free Overparametrized Asymmetric Matrix Factorization

Algorithmic Regularization in Model-free Overparametrized Asymmetric Matrix Factorization

Global Convergence of Sub-gradient Method for Robust Matrix Recovery: Small Initialization, Noisy Measurements, and Over-parameterization

Contact Info

Product

Resources

About