Massively scalable Sinkhorn distances via the Nyström method

Altschuler, Jason M.; Bach, Francis; Rudi, Alessandro; Niles-Weed, Jonathan

doi:10.48550/arxiv.1812.05189

Cited by 11 publications

(16 citation statements)

References 34 publications

(53 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Its increasing popularity in machine learning follows from the landmark paper [Cut13], who showed that it defines a differentiable loss function for supervised learning, and takes advantage of GPU architectures. We also refer to [BCC + 15,ABRW18] for some illustrative recent works presenting theoretical and numerical advances on Sinkhorn's algorithm.…”

Section: Entropic Regularizationmentioning

confidence: 99%

Sinkhorn Divergences for Unbalanced Optimal Transport

Séjourné¹,

Feydy²,

Vialard³

et al. 2019

Preprint

View full text Add to dashboard Cite

This paper extends the formulation of Sinkhorn divergences to the unbalanced setting of arbitrary positive measures, providing both theoretical and algorithmic advances. Sinkhorn divergences leverage the entropic regularization of Optimal Transport (OT) to define geometric loss functions. They are differentiable, cheap to compute and do not suffer from the curse of dimensionality, while maintaining the geometric properties of OT, in particular they metrize the weak * convergence. Extending these divergences to the unbalanced setting is of utmost importance since most applications in data sciences require to handle both transportation and creation/destruction of mass. This includes for instance problems as diverse as shape registration in medical imaging, density fitting in statistics, generative modeling in machine learning, and particles flows involving birth/death dynamics. Our first set of contributions is the definition and the theoretical analysis of the unbalanced Sinkhorn divergences. They enjoy the same properties as the balanced divergences (classical OT), which are obtained as a special case. Indeed, we show that they are convex, differentiable and metrize the weak * convergence. Our second set of contributions studies generalized Sinkkhorn iterations, which enable a fast, stable and massively parallelizable algorithm to compute these divergences. We show, under mild assumptions, a linear rate of convergence, independent of the number of samples, i.e. which can cope with arbitrary input measures. We also highlight the versatility of this method, which takes benefit from the latest advances in term of GPU computing, for instance through the KeOps library for fast and scalable kernel operations.

show abstract

Section: Entropic Regularizationmentioning

confidence: 99%

Sinkhorn Divergences for Unbalanced Optimal Transport

Séjourné¹,

Feydy²,

Vialard³

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

“…Conventionally, to compute with such data one might begin by extracting a low-dimensional representation using nonlinear dimensionality reduction ("manifold learning") algorithms [3-5, 7, 15, 54, 55]. For supervised tasks, there is also theoretical work on kernel regression over manifolds [12,14,22,51]. These results rely on very general Sobolev embedding theorems, which are not precise enough to specify the interplay between regularity of the kernel and properties of the data need to obtain concrete resource tradeoffs in the two curve problem.…”

Section: Related Workmentioning

confidence: 99%

Deep Networks Provably Classify Data on Curves

Wang

Buchanan

Gilboa

et al. 2021

Preprint

View full text Add to dashboard Cite

Data with low-dimensional nonlinear structure are ubiquitous in engineering and scientific problems. We study a model problem with such structure-a binary classification task that uses a deep fully-connected neural network to classify data drawn from two disjoint smooth curves on the unit sphere. Aside from mild regularity conditions, we place no restrictions on the configuration of the curves. We prove that when (i) the network depth is large relative to certain geometric properties that set the difficulty of the problem and (ii) the network width and number of samples are polynomial in the depth, randomly-initialized gradient descent quickly learns to correctly classify all points on the two curves with high probability. To our knowledge, this is the first generalization guarantee for deep networks with nonlinear data that depends only on intrinsic data properties. Our analysis proceeds by a reduction to dynamics in the neural tangent kernel (NTK) regime, where the network depth plays the role of a fitting resource in solving the classification problem. In particular, via fine-grained control of the decay properties of the NTK, we demonstrate that when the network is sufficiently deep, the NTK can be locally approximated by a translationally invariant operator on the manifolds and stably inverted over smooth functions, which guarantees convergence and generalization.

show abstract

“…This regularization allows the computation of OT for large problems, such as those arising in machine learning [14,22,23] or computer graphics [7,42,44]. Recently, Altschuler et al [2] introduced a method to accelerate the Sinkhorn algorithm via low-rank (Nyström) approximations of the kernel [2]. Simultaneously, there have been considerable efforts to study the convergence and approximation properties of the Sinkhorn algorithm [3] and its variances [21].…”

Section: Computational Optimal Transportmentioning

confidence: 99%

Ground Metric Learning on Graphs

Heitz,

Bonneel,

Coeurjolly

et al. 2019

Preprint

View full text Add to dashboard Cite

Optimal transport (OT) distances between probability distributions are parameterized by the ground metric they use between observations. Their relevance for real-life applications strongly hinges on whether that ground metric parameter is suitably chosen. Selecting it adaptively and algorithmically from prior knowledge, the so-called ground metric learning (GML) problem, has therefore appeared in various settings. We consider it in this paper when the learned metric is constrained to be a geodesic distance on a graph that supports the measures of interest. This imposes a rich structure for candidate metrics, but also enables far more efficient learning procedures when compared to a direct optimization over the space of all metric matrices. We use this setting to tackle an inverse problem stemming from the observation of a density evolving with time: we seek a graph ground metric such that the OT interpolation between the starting and ending densities that result from that ground metric agrees with the observed evolution. This OT dynamic framework is relevant to model natural phenomena exhibiting displacements of mass, such as for instance the evolution of the color palette induced by the modification of lighting and materials.

show abstract

Massively scalable Sinkhorn distances via the Nyström method

Cited by 11 publications

References 34 publications

Sinkhorn Divergences for Unbalanced Optimal Transport

Sinkhorn Divergences for Unbalanced Optimal Transport

Deep Networks Provably Classify Data on Curves

Ground Metric Learning on Graphs

Contact Info

Product

Resources

About