2018
DOI: 10.48550/arxiv.1805.11897
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Differential Properties of Sinkhorn Approximation for Learning with Wasserstein Distance

Abstract: Applications of optimal transport have recently gained remarkable attention thanks to the computational advantages of entropic regularization. However, in most situations the Sinkhorn approximation of the Wasserstein distance is replaced by a regularized version that is less accurate but easy to differentiate. In this work we characterize the differential properties of the original Sinkhorn distance, proving that it enjoys the same smoothness as its regularized version and we explicitly provide an efficient al… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(13 citation statements)
references
References 16 publications
0
13
0
Order By: Relevance
“…The loss function is a custom implementation in TensorFlow of a sharp Sinkhorn [28] usingscaling [19,29,30]. In principle symbolic differentiation should be effective for this problem, however I encountered debilitating numerical instabilities that I was unable to diagnose.…”
Section: A Appendixmentioning
confidence: 99%
See 1 more Smart Citation
“…The loss function is a custom implementation in TensorFlow of a sharp Sinkhorn [28] usingscaling [19,29,30]. In principle symbolic differentiation should be effective for this problem, however I encountered debilitating numerical instabilities that I was unable to diagnose.…”
Section: A Appendixmentioning
confidence: 99%
“…In principle symbolic differentiation should be effective for this problem, however I encountered debilitating numerical instabilities that I was unable to diagnose. I therefore implemented the explicit gradient introduced in [28]. The Sinkhorn distance is calculated with regulator scaling from 1 to 0.01 in ten log-uniform steps, with ten iterations per step.…”
Section: A Appendixmentioning
confidence: 99%
“…The Jacobian of an optimization problem solution can also be computed using the implicit function theorem (Griewank and Walther, 2008;Krantz and Parks, 2012;Blondel et al, 2021) instead of backpropagation if the number of iterations becomes a memory bottleneck. Together with Sinkhorn, implicit differentiation has been used by Luise et al (2018) and Cuturi et al (2020).…”
Section: Sinkformersmentioning
confidence: 99%
“…And 𝑐 (𝒙 𝑖 , 𝒚 𝑗 ) is the cost function evaluating the distance between 𝒙 𝑖 and 𝒚 𝑗 (samples of the two distributions). Computing the optimal distance (1st line) is equivalent to solving the network-flow problem (2nd line) [17]. The calculated matrix T denotes the "transport plan", where each element T 𝑖 𝑗 represents the amount of mass shifted from u 𝑖 to v 𝑗 .…”
Section: Preliminariesmentioning
confidence: 99%