This paper initiates a systematic development of a theory of non-commutative optimization, a setting which greatly extends ordinary (Euclidean) convex optimization. It aims to unify and generalize a growing body of work from the past few years which developed and analyzed algorithms for natural geodesically convex optimization problems on Riemannian manifolds that arise from the symmetries of non-commutative groups. More specifically, these are algorithms to minimize the moment map (a non-commutative notion of the usual gradient), and to test membership in moment polytopes (a vast class of polytopes, typically of exponential vertex and facet complexity, which quite magically arise from this a-priori non-convex, non-linear setting).The importance of understanding this very general setting of geodesic optimization, as these works unveiled and powerfully demonstrate, is that it captures a diverse set of problems, many non-convex, in different areas of CS, math, and physics. Several of them were solved efficiently for the first time using non-commutative methods; the corresponding algorithms also lead to solutions of purely structural problems and to many new connections between disparate fields.In the spirit of standard convex optimization, we develop two general methods in the geodesic setting, a first order and a second order method, which respectively receive first and second order information on the "derivatives" of the function to be optimized. These in particular subsume all past results. The main technical work, again unifying and extending much of the previous work, goes into identifying the key parameters of the underlying group actions which control convergence to the optimum in each of these methods. These non-commutative analogues of "smoothness" in the commutative case are far more complex, and require significant algebraic and analytic machinery (much existing and some newly developed here). Despite this complexity, the way in which these parameters control convergence in both methods is quite simple and elegant. We also bound these parameters in several general cases.Our work points to intriguing open problems and suggests further research directions. We believe that extending this theory, namely understanding geodesic optimization better, is both mathematically and computationally fascinating; it provides a great meeting place for ideas and techniques from several very different research areas, and promises better algorithms for existing and yet unforeseen applications.
No abstract
The completely positive maps, a generalization of the nonnegative matrices, are a wellstudied class of maps from n × n matrices to m × m matrices. The existence of the operator analogues of doubly stochastic scalings of matrices, the study of which is known as operator scaling, is equivalent to a multitude of problems in computer science and mathematics such rational identity testing in non-commuting variables, noncommutative rank of symbolic matrices, and a basic problem in invariant theory .We study operator scaling with specified marginals, which is the operator analogue of scaling matrices to specified row and column sums (or marginals). We characterize the operators which can be scaled to given marginals, much in the spirit of the Gurvits' algorithmic characterization of the operators that can be scaled to doubly stochastic (Gurvits, 2004). Our algorithm, which is a modified version of Gurvits' algorithm, produces approximate scalings in time poly(n, m) whenever scalings exist. A central ingredient in our analysis is a reduction from operator scaling with specified marginals to operator scaling in the doubly stochastic setting.Instances of operator scaling with specified marginals arise in diverse areas of study such as the Brascamp-Lieb inequalities, communication complexity, eigenvalues of sums of Hermitian matrices, and quantum information theory. Some of the known theorems in these areas, several of which had no algorithmic proof, are straightforward consequences of our characterization theorem. For instance, we obtain a simple algorithm to find, when it exists, a tuple of Hermitian matrices with given spectra whose sum has a given spectrum. We also prove new theorems such as a generalization of Forster's theorem (Forster, 2002) concerning radial isotropic position. * Supported in part by Simons Foundation award 332622.
We present a polynomial time algorithm to approximately scale tensors of any format to arbitrary prescribed marginals (whenever possible). This unifies and generalizes a sequence of past works on matrix, operator and tensor scaling. Our algorithm provides an efficient weak membership oracle for the associated moment polytopes, an important family of implicitly-defined convex polytopes with exponentially many facets and a wide range of applications. These include the entanglement polytopes from quantum information theory (in particular, we obtain an efficient solution to the notorious one-body quantum marginal problem) and the Kronecker polytopes from representation theory (which capture the asymptotic support of Kronecker coefficients). Our algorithm can be applied to succinct descriptions of the input tensor whenever the marginals can be efficiently computed, as in the important case of matrix product states or tensor-train decompositions, widely used in computational physics and numerical mathematics.Beyond these applications, the algorithm enriches the arsenal of "numerical" methods for classical problems in invariant theory that are significantly faster than "symbolic" methods which explicitly compute invariants or covariants of the relevant action. We stress that (like almost all past algorithms) our convergence rate is polynomial in the approximation parameter; it is an intriguing question to achieve exponential convergence rate, beating symbolic algorithms exponentially, and providing strong membership and separation oracles for the problems above.We strengthen and generalize the alternating minimization approach of previous papers by introducing the theory of highest weight vectors from representation theory into the numerical optimization framework. We show that highest weight vectors are natural potential functions for scaling algorithms and prove new bounds on their evaluations to obtain polynomial-time convergence. Our techniques are general and we believe that they will be instrumental to obtain efficient algorithms for moment polytopes beyond the ones consider here, and more broadly, for other optimization problems possessing natural symmetries.
Motivated by the Komlós conjecture in combinatorial discrepancy, we study the discrepancy of random matrices with m rows and n independent columns drawn from a bounded lattice random variable. It is known that for n tending to infinity and m fixed, with high probability the ℓ ∞ -discrepancy is at most twice the ℓ ∞ -covering radius of the integer span of the support of the random variable. However, the easy argument for the above fact gives no concrete bounds on the failure probability in terms of n. We prove that the failure probability is inverse polynomial in m, n and some well-motivated parameters of the random variable. We also obtain the analogous bounds for the discrepancy in arbitrary norms.We apply these results to two random models of interest. For random t-sparse matrices, i.e. uniformly random matrices with t ones and m − t zeroes in each column, we show that the ℓ ∞ -discrepancy is at most 2 with probability 1 − O( log n/n) for n = Ω(m 3 log 2 m). This improves on a bound proved by Ezra and Lovett (Ezra and Lovett, Approx+Random, 2015) showing that the same is true for n at least m t . For matrices with random unit vector columns, we show that the ℓ ∞ -discrepancy is O(exp( n/m 3 )) with probability 1 − O( log n/n) for n = Ω(m 3 log 2 m). Our approach, in the spirit of Kuperberg, Lovett and Peled (G. Kuperberg, S. Lovett and R. Peled, STOC 2012), uses Fourier analysis to prove that for m × n matrices M with i.i.d. columns, and n sufficiently large, the distribution of M y for random y ∈ {−1, 1} n obeys a local limit theorem.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.