Implicit Regularization for Optimal Sparse Recovery

Vaškevičius, Tomas; Kanade, Varun; Rebeschini, Patrick

doi:10.48550/arxiv.1909.05122

Cited by 2 publications

(3 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The linear diagonal neural networks we consider have been studied in the case of gradient descent [33] and stochastic gradient descent with label noise [15]. In both cases the authors show that this model has the ability to implicitly bias the training procedure to help retrieve a sparse predictor.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity

Pesme¹,

Pillaud-Vivien²,

Flammarion³

2021

Preprint

View full text Add to dashboard Cite

Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks. In this paper, we study the dynamics of stochastic gradient descent over diagonal linear networks through its continuous time version, namely stochastic gradient flow. We explicitly characterise the solution chosen by the stochastic flow and prove that it always enjoys better generalisation properties than that of gradient flow. Quite surprisingly, we show that the convergence speed of the training loss controls the magnitude of the biasing effect: the slower the convergence, the better the bias. To fully complete our analysis, we provide convergence guarantees for the dynamics. We also give experimental results which support our theoretical claims. Our findings highlight the fact that structured noise can induce better generalisation and they help explain the greater performances observed in practice of stochastic gradient descent over gradient descent.

show abstract

Section: Related Workmentioning

confidence: 99%

“…For the sake of completeness, the study of diagonal linear networks of arbitrary depth p ≥ 3 is done in Appendix E.2. Also note that additionally to being a toy neural model, it has received recent attention on its own for its practical ability to induce sparsity [33,34,15] or to solve phase retrieval problems [37].…”

Section: Notationsmentioning

confidence: 99%

Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity

Pesme¹,

Pillaud-Vivien²,

Flammarion³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The quadratic parametrisations which we consider have become popular lately (Vaškevičius et al, 2019) since, despite their simplicity, they already enable to grasp the complexity of more general networks. Indeed, they highlight important aspects of the theoretical concerns of modern machine learning: the neural tangent kernel regime, the roles of overparametrisation and of the initialisation (Woodworth et al, 2020).…”

Section: Introductionmentioning

confidence: 99%

Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation

Pillaud-Vivien¹,

Reygner²,

Flammarion³

2022

Preprint

View full text Add to dashboard Cite

Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks. In this paper, we study the role of the label noise in the training dynamics of a quadratically parametrised model through its continuous time version. We explicitly characterise the solution chosen by the stochastic flow and prove that it implicitly solves a Lasso program. To fully complete our analysis, we provide nonasymptotic convergence guarantees for the dynamics as well as conditions for support recovery. We also give experimental results which support our theoretical claims. Our findings highlight the fact that structured noise can induce better generalisation and help explain the greater performances of stochastic dynamics as observed in practice.

show abstract

Implicit Regularization for Optimal Sparse Recovery

Cited by 2 publications

References 30 publications

Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity

Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity

Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation

Contact Info

Product

Resources

About