2021
DOI: 10.1007/s10107-021-01636-z
|View full text |Cite
|
Sign up to set email alerts
|

Sparse optimization on measures with over-parameterized gradient descent

Abstract: Minimizing a convex function of a measure with a sparsity-inducing penalty is a typical problem arising, e.g., in sparse spikes deconvolution or two-layer neural networks training. We show that this problem can be solved by discretizing the measure and running nonconvex gradient descent on the positions and weights of the particles. For measures on a d-dimensional manifold and under some non-degeneracy assumptions, this leads to a global optimization algorithm with a complexity scaling as log(1/ ) in the desir… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
94
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 57 publications
(94 citation statements)
references
References 41 publications
(66 reference statements)
0
94
0
Order By: Relevance
“…where F N is defined in Equation (47). As already emphasized in [11], Equation ( 15) is a convex program in µ whereas the parametrization given in H translates this minimization into a nonconvex differentiable problem in terms of r and t. This function H can be seen as an instance of the BLASSO Equation ( 15) for the measure µ a,t , namely a convex program that does not depends on the number of Particles N . All the more, it is possible to run a gradient descent on positions t i ∈ R d and weights r i > 0 of the N particles system.…”
Section: A3 Conic Particle Gradient Descent (Cpgd)mentioning
confidence: 99%
See 3 more Smart Citations
“…where F N is defined in Equation (47). As already emphasized in [11], Equation ( 15) is a convex program in µ whereas the parametrization given in H translates this minimization into a nonconvex differentiable problem in terms of r and t. This function H can be seen as an instance of the BLASSO Equation ( 15) for the measure µ a,t , namely a convex program that does not depends on the number of Particles N . All the more, it is possible to run a gradient descent on positions t i ∈ R d and weights r i > 0 of the N particles system.…”
Section: A3 Conic Particle Gradient Descent (Cpgd)mentioning
confidence: 99%
“…Hence, for a large enough number of particles N , it then implies the convergence towards the global minimizer of µ a,t −→ F (µ a,t ) itself, despite the lack of convexity of the function (r, t) −→ F N (r, t). We refer to Theorem 3.9 of [11] that establishes the convergence of the particle gradient descent with a constant step-size under some non-degeneracy assumptions, i.e. the convergence of the CPGD towardμ n (15) in Hellinger-Kantorovich metric, and hence in the weak-sense.…”
Section: A3 Conic Particle Gradient Descent (Cpgd)mentioning
confidence: 99%
See 2 more Smart Citations
“…The following classes of algorithms better account for the infinite dimensional nature of M(X ). We present in detail the three methods with the most established results in the literature [13,18,23]. Before describing these methods, let us remark that there also exist some promising avenues, such as the projected gradient descent [24,25].…”
Section: Numerical Strategies To Tackle the Blassomentioning
confidence: 99%