No abstract
Entropic regularization provides a generalization of the original optimal transport problem. It introduces a penalty term defined by the Kullback-Leibler divergence, making the problem more tractable via the celebrated Sinkhorn algorithm. Replacing the Kullback-Leibler divergence with a general f -divergence leads to a natural generalization. Using convex analysis, we extend the theory developed so far to include f -divergences defined by functions of Legendre type, and prove that under some mild conditions, strong duality holds, optimums in both the primal and dual problems are attained, the generalization of the c-transform is well-defined, and we give sufficient conditions for the generalized Sinkhorn algorithm to converge to an optimal solution. We propose a practical algorithm for computing the regularized optimal transport cost and its gradient via the generalized Sinkhorn algorithm. Finally, we present experimental results on synthetic 2-dimensional data, demonstrating the effects of using different f -divergences for regularization, which influences convergence speed, numerical stability and sparsity of the optimal coupling.
An explanation for the success of deep neural networks is a central question in theoretical machine learning. According to classical statistical learning, the overparameterized nature of such models should imply a failure to generalize. Many argue that good empirical performance is due to the implicit regularization of first order optimization methods. In particular, the Polyak-Łojasiewicz condition leads to gradient descent finding a global optimum that is close to initialization. In this work, we propose a framework consisting of a prototype learning problem, which is general enough to cover many popular problems and even the cases of infinitely wide neural networks and infinite data. We then perform an analysis from the perspective of the Polyak-Łojasiewicz condition. We obtain theoretical results of independent interest, concerning gradient descent on a compositionBuilding on these results, we determine the properties that have to be satisfied by the components of the prototype problem for gradient descent to find a global optimum that is close to initialization. We then demonstrate that supervised learning, variational autoencoders and training with gradient penalty can be translated to the prototype problem. Finally, we lay out a number of directions for future research.Preprint. Under review.
We propose a family of extensions of the Kantorovich-Rubinstein norm from the space of zero-charge countably additive measures on a compact metric space to the space of all countably additive measures, and a family of extensions of the Lipschitz norm from the quotient space of Lipschitz functions on a compact metric space to the space of all Lipschitz functions. These families are parameterized by p, q ∈ [1, ∞], and if p, q are Hölder conjugates, then the dual of the resulting p-Kantorovich space is isometrically isomorphic to the resulting q-Lipschitz space.
In this note, following [Chitescu et al., 2014], we show that the Monge-Kantorovich norm on the vector space of countably additive measures on a compact metric space has a primal representation analogous to the Hanin norm, meaning that similarly to the Hanin norm, the Monge-Kantorovich norm can be seen as an extension of the Kantorovich-Rubinstein norm from the vector subspace of zero-charge measures, implying a number of novel results, such as the equivalence of the Monge-Kantorovich and Hanin norms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.