Convex Analysis of the Mean Field Langevin Dynamics

Nitanda, Atsushi; Wu, Denny; Suzuki, Taiji

doi:10.48550/arxiv.2201.10469

Cited by 5 publications

(14 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• Under these assumptions, the convergence of µ s to µ * in relative entropy and in Wasserstein distance, also holds, with the same rate [27,9].…”

Section: Exponential Convergencementioning

confidence: 85%

“…Let us recall the statement of Theorem 3.3 and prove it, by an application of a convergence result proved independently in [27] and [9].…”

Section: Proof Of Theorem 33mentioning

confidence: 94%

“…It is tackled in the original paper by discretizing the space and then applying convex optimization methods. The goal of the present paper is to show that the specific structure of this estimator makes it amenable to a newly introduced class of grid-free stochastic methods called Mean-Field Langevin (MFL) dynamics [18,24], a non-linear generalization of Langevin dynamics which enjoys quantitative global convergence guarantees [27,9]. Instantiated in our context, this method consists in a family of point clouds (one per snapshot) coupled via entropy regularized optimal transport, a.k.a.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Trajectory Inference via Mean-field Langevin in Path Space

Zhang¹,

Chizat²,

Heitz³

et al. 2022

Preprint

View full text Add to dashboard Cite

Trajectory inference aims at recovering the dynamics of a population from snapshots of its temporal marginals. To solve this task, a min-entropy estimator relative to the Wiener measure in path space was introduced in Lavenant et al. [20], and shown to consistently recover the dynamics of a large class of drift-diffusion processes from the solution of an infinite dimensional convex optimization problem. In this paper, we introduce a grid-free algorithm to compute this estimator. Our method consists in a family of point clouds (one per snapshot) coupled via Schrödinger bridges which evolve with noisy gradient descent. We study the mean-field limit of the dynamics and prove its global convergence at an exponential rate to the desired estimator. Overall, this leads to an inference method with end-to-end theoretical guarantees that solves an interpretable model for trajectory inference. We also present how to adapt the method to deal with mass variations, a useful extension when dealing with single cell RNA-sequencing data where cells can branch and die.

show abstract

“…• Under these assumptions, the convergence of µ s to µ * in relative entropy and in Wasserstein distance, also holds, with the same rate [27,9].…”

Section: Exponential Convergencementioning

confidence: 85%

“…Let us recall the statement of Theorem 3.3 and prove it, by an application of a convergence result proved independently in [27] and [9].…”

Section: Proof Of Theorem 33mentioning

confidence: 94%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Trajectory Inference via Mean-field Langevin in Path Space

Zhang¹,

Chizat²,

Heitz³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Upon completion of this work, we became aware of the preprint Nitanda et al [2022] which also proves the exponential convergence of the Mean-Field Langevin dynamics with the same proof technique. Our work was conducted independently and simultaneously, and their contribution is not reflected in the present version of our paper (beyond this paragraph).…”

Section: Contributions and Related Workmentioning

confidence: 95%

Mean-Field Langevin Dynamics: Exponential Convergence and Annealing

Chizat¹

2022

Preprint

View full text Add to dashboard Cite

Noisy particle gradient descent (NPGD) is an algorithm to minimize convex functions over the space of measures that include an entropy term. In the many-particle limit, this algorithm is described by a Mean-Field Langevin dynamics-a generalization of the Langevin dynamic with a non-linear drift-which is our main object of study. Previous work have shown its convergence to the unique minimizer via non-quantitative arguments. We prove that this dynamics converges at an exponential rate, under the assumption that a certain family of Log-Sobolev inequalities holds. This assumption holds for instance for the minimization of the risk of certain two-layer neural networks, where NPGD is equivalent to standard noisy gradient descent. We also study the annealed dynamics, and show that for a noise decaying at a logarithmic rate, the dynamics converges in value to the global minimizer of the unregularized objective function.

show abstract

“…We shall note that two recent independent works (Chizat (2022); Nitanda et al (2022)) also tried to prove the linear convergence result for neural networks trained by noisy SGD in the meanfield regime, but as pointed out by the authors (Chizat (2022)), their assumptions of boundedness and smoothness in both works cannot be applied to the vanilla two-layer neural networks (we will discuss details in later sections). Our work differs from them in both assumptions and proof techniques.…”

Section: Related Workmentioning

confidence: 99%

Mean-Field Analysis of Two-Layer Neural Networks: Global Optimality with Linear Convergence Rates

Zhang¹,

Huang²

2022

Preprint

View full text Add to dashboard Cite

We consider optimizing two-layer neural networks in the mean-field regime where the learning dynamics of network weights can be approximated by the evolution in the space of probability measures over the weight parameters associated with the neurons. The mean-field regime is a theoretically attractive alternative to the NTK (lazy training) regime which is only restricted locally in the so-called neural tangent kernel space around specialized initializations. Several prior works (Mei et al. (2018); Chizat and Bach ( 2018)) establish the asymptotic global optimality of the mean-field regime, but it is still challenging to obtain a quantitative convergence rate due to the complicated nonlinearity of the training dynamics. This work establishes a new linear convergence result for two-layer neural networks trained by continuous-time noisy gradient descent in the mean-field regime. Our result relies on a novelty logarithmic Sobolev inequality for two-layer neural networks, and uniform upper bounds on the logarithmic Sobolev constants for a family of measures determined by the evolving distribution of hidden neurons.

show abstract

Convex Analysis of the Mean Field Langevin Dynamics

Cited by 5 publications

References 28 publications

Trajectory Inference via Mean-field Langevin in Path Space

Trajectory Inference via Mean-field Langevin in Path Space

Mean-Field Langevin Dynamics: Exponential Convergence and Annealing

Mean-Field Analysis of Two-Layer Neural Networks: Global Optimality with Linear Convergence Rates

Contact Info

Product

Resources

About