Hung-Hsu Chou scite author profile

Hung-Hsu Chou

5Publications

32Citation Statements Received

122Citation Statements Given

How they've been cited

How they cite others

121

Affiliations

Ludwig-Maximilians-Universität München, RWTH Aachen University

Publications

Order By: Most citations

More is Less: Inducing Sparsity via Overparameterization

Chou¹,

Maly²,

Rauhut³

2021

Preprint

View full text Add to dashboard Cite

In deep learning it is common to overparameterize the neural networks, that is, to use more parameters than training samples. Quite surprisingly training the neural network via (stochastic) gradient descent leads to models that generalize very well, while classical statistics would suggest overfitting. In order to gain understanding of this implicit bias phenomenon we study the special case of sparse recovery (compressive sensing) which is of interest on its own. More precisely, in order to reconstruct a vector from underdetermined linear measurements, we introduce a corresponding overparameterized square loss functional, where the vector to be reconstructed is deeply factorized into several vectors. We show that, under a very mild assumption on the measurement matrix, vanilla gradient flow for the overparameterized loss functional converges to a solution of minimal 1-norm. The latter is well-known to promote sparse solutions. As a by-product, our results significantly improve the sample complexity for compressive sensing in previous works. The theory accurately predicts the recovery rate in numerical experiments. For the proofs, we introduce the concept of solution entropy, which bypasses the obstacles caused by non-convexity and should be of independent interest.

show abstract

Gradient Descent for Deep Matrix Factorization: Dynamics and Implicit Bias towards Low Rank

Chou¹,

Gieshoff²,

Maly³

et al. 2020

Preprint

View full text Add to dashboard Cite

We provide an explicit analysis of the dynamics of vanilla gradient descent for deep matrix factorization in a setting where the minimizer of the loss function is unique. We show that the recovery rate of ground-truth eigenvectors is proportional to the magnitude of the corresponding eigenvalues and that the differences among the rates are amplified as the depth of the factorization increases. For exactly characterized time intervals, the effective rank of gradient descent iterates is provably close to the effective rank of a low-rank projection of the ground-truth matrix, such that early stopping of gradient descent produces regularized solutions that may be used for denoising, for instance. In particular, apart from few initial steps of the iterations, the effective rank of our matrix is monotonically increasing, suggesting that "matrix factorization implicitly enforces gradient descent to take a route in which the effective rank is monotone". Since empirical observations in more general scenarios such as matrix sensing show a similar phenomenon, we believe that our theoretical results shed some light on the still mysterious "implicit bias" of gradient descent in deep learning.

show abstract

Gradient Descent for Deep Matrix Factorization: Dynamics and Implicit Bias Towards Low Rank

Chou

Gieshoff

Maly³

et al. 2022

SSRN Journal

View full text Add to dashboard Cite

Overparameterization and generalization error: weighted trigonometric interpolation

Xie¹,

Chou²,

Rauhut³

et al. 2020

Preprint

View full text Add to dashboard Cite

Overparameterization and Generalization Error: Weighted Trigonometric Interpolation

Xie¹,

Chou²,

Rauhut³

et al. 2022

SIAM Journal on Mathematics of Data Science

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hung-Hsu Chou

More is Less: Inducing Sparsity via Overparameterization

Gradient Descent for Deep Matrix Factorization: Dynamics and Implicit Bias towards Low Rank

Gradient Descent for Deep Matrix Factorization: Dynamics and Implicit Bias Towards Low Rank

Overparameterization and generalization error: weighted trigonometric interpolation

Overparameterization and Generalization Error: Weighted Trigonometric Interpolation

Contact Info

Product

Resources

About