2018
DOI: 10.1093/imaiai/iay002
|View full text |Cite
|
Sign up to set email alerts
|

Gradient descent with non-convex constraints: local concavity determines convergence

Abstract: Many problems in high-dimensional statistics and optimization involve minimization over non-convex constraints—for instance, a rank constraint for a matrix estimation problem—but little is known about the theoretical properties of such optimization problems for a general non-convex constraint set. In this paper we study the interplay between the geometric properties of the constraint set and the convergence behavior of gradient descent for minimization over this set. We develop the notion of local concavity co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
29
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(29 citation statements)
references
References 22 publications
0
29
0
Order By: Relevance
“…Compared to the classical nonconvex optimization theory, which only shows a sublinear convergence to a local optima, the focus of the recent literature is on establishing linear rates of convergence or characterizing that the objective does not have spurious local minima. In addition to the methods that work on the factorized form, [63,76,64,13] consider projected gradient-type methods which optimize over the matrix variable Θ ∈ R m 1 ×m 2 . These methods involve calculating the top r singular vectors of an m 1 × m 2 matrix at each iteration.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Compared to the classical nonconvex optimization theory, which only shows a sublinear convergence to a local optima, the focus of the recent literature is on establishing linear rates of convergence or characterizing that the objective does not have spurious local minima. In addition to the methods that work on the factorized form, [63,76,64,13] consider projected gradient-type methods which optimize over the matrix variable Θ ∈ R m 1 ×m 2 . These methods involve calculating the top r singular vectors of an m 1 × m 2 matrix at each iteration.…”
Section: Related Workmentioning
confidence: 99%
“…Taking average, and plugging in the main result (13) and the statistical error (19) we obtain our desired result.…”
Section: Application To Multi-task Reinforcement Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…Researchers therefore often compare results from multiple algorithms and hyperparameters [ 7 , 24 28 ]. Typically, the effect of hyperparameter choice on the quality of clustering results cannot be described with a convex function, meaning that hyperparameters should be chosen through exhaustive grid search [ 29 ], a slow and cumbersome process. Software packages for automatic hyperparameter tuning and model selection for regression and classification exist, notably auto-sklearn from AutoML [ 30 ], and some groups have made excellent tools for distributing a single clustering calculation for huge datasets [ 31 , 32 ], but to the best of our knowledge, there is no package for comparing several clustering algorithms and hyperparameters.…”
Section: Introductionmentioning
confidence: 99%
“…hyperparameters should be chosen through exhaustive grid search [29], a slow and cumbersome process. Software packages for automatic hyperparameter tuning and model selection for regression and classification exist, notably auto-sklearn from AutoML [30], and some groups have made excellent tools for distributing a single clustering calculation for huge datasets [31,32], but to the best of our knowledge, there is no package for comparing several clustering algorithms and hyperparameters.…”
mentioning
confidence: 99%