2020
DOI: 10.48550/arxiv.2001.01700
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Gradient descent algorithms for Bures-Wasserstein barycenters

Sinho Chewi,
Tyler Maunu,
Philippe Rigollet
et al.

Abstract: We study first order methods to compute the barycenter of a probability distribution over the Bures-Wasserstein manifold. We derive global rates of convergence for both gradient descent and stochastic gradient descent despite the fact that the barycenter functional is not geodesically convex. Our analysis overcomes this technical hurdle by developing a Polyak-Lojasiewicz (PL) inequality, which is built using tools from optimal transport and metric geometry.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 22 publications
(36 reference statements)
0
6
0
Order By: Relevance
“…However, it is possible to obtain guarantees in a restricted setting by establishing a Polyak-Lojasiewicz type inequality. In particular, assuming all µ i are Gaussian distributions with positive-definite covariance matrices, it is shown that the gradient-descent algorithm admits a linear convergence rate [Chewi et al, 2020].…”
Section: Methods and Algorithmsmentioning
confidence: 99%
“…However, it is possible to obtain guarantees in a restricted setting by establishing a Polyak-Lojasiewicz type inequality. In particular, assuming all µ i are Gaussian distributions with positive-definite covariance matrices, it is shown that the gradient-descent algorithm admits a linear convergence rate [Chewi et al, 2020].…”
Section: Methods and Algorithmsmentioning
confidence: 99%
“…Second, while this paper and much of the literature focuses on computing Wasserstein barycenters of discrete distributions, there is also an interesting line of work on computing barycenters of continous distributions. In order to ensure that µ i and ν have concise representations for computational purposes, the distributions are typically restricted to Gaussians, in which case specialized algorithms can be designed; see e.g., [5,12].…”
Section: Previous Workmentioning
confidence: 99%
“…This implies that the barycenter coincides with such data distribution leading to a significantly simpler problem. In the case where all measures are Gaussians, Chewi et al [12] derive explicitely the gradients of the Wasserstein barycenter functional with respect to the mean and variance of the barycenter and use SGD to learn it.…”
Section: Inductive Biasesmentioning
confidence: 99%