2022
DOI: 10.2139/ssrn.4139166
|View full text |Cite
|
Sign up to set email alerts
|

Gradient Descent for Deep Matrix Factorization: Dynamics and Implicit Bias Towards Low Rank

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
1
1
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 23 publications
0
3
0
Order By: Relevance
“…In fact, the overparametrized gradient descent provides a tuning free alternative to established algorithms like LASSO and Basis Pursuit. In a similar manner our present contribution stems from the insight that in most of the above papers [1,2,17,18] the signs of components do not change over time when gradient flow is applied. Instead of viewing this feature as an obstacle, cf.…”
Section: Related Work -Overparameterization and Implicit Regularizationmentioning
confidence: 61%
See 2 more Smart Citations
“…In fact, the overparametrized gradient descent provides a tuning free alternative to established algorithms like LASSO and Basis Pursuit. In a similar manner our present contribution stems from the insight that in most of the above papers [1,2,17,18] the signs of components do not change over time when gradient flow is applied. Instead of viewing this feature as an obstacle, cf.…”
Section: Related Work -Overparameterization and Implicit Regularizationmentioning
confidence: 61%
“…Because ∂ t D F (z + , x(t)) is identical for all z + ∈ S + (the second line of (17) does not depend on the choice of z + ), the difference…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Also, the results show that the neural networks were robust against noise even when the neural networks had many parameters. The neural networks' robustness against noise is probably due to implicit bias, in which over‐parametrized neural networks tend to learn a simple structure from data (Chou et al., 2024). This property is not trivial because traditional approaches (e.g., using linear interpolation functions to represent the constitutive relations) would be overfitted to the noisy data and require some regularization techniques.…”
Section: Inverse Modelingmentioning
confidence: 99%