We formulate the metric learning problem as that of minimizing the differential relative entropy between two multivariate Gaussians under constraints on the Mahalanobis distance function. Via a surprising equivalence, we show that this problem can be solved as a low-rank kernel learning problem. Specifically, we minimize the Burg divergence of a low-rank kernel to an input kernel, subject to pairwise distance constraints. Our approach has several advantages over existing methods. First, we present a natural information-theoretic formulation for the problem. Second, the algorithm utilizes the methods developed by Kulis et al.[6], which do not involve any eigenvector computation; in particular, the running time of our method is faster than most existing techniques. Third, the formulation offers insights into connections between metric learning and kernel learning.
We study Frank-Wolfe methods for nonconvex stochastic and finite-sum optimization problems. Frank-Wolfe methods (in the convex case) have gained tremendous recent interest in machine learning and optimization communities due to their projection-free property and their ability to exploit structured constraints. However, our understanding of these algorithms in the nonconvex setting is fairly limited. In this paper, we propose nonconvex stochastic Frank-Wolfe methods and analyze their convergence properties. For objective functions that decompose into a finitesum, we leverage ideas from variance reduction techniques for convex optimization to obtain new variance reduced nonconvex Frank-Wolfe methods that have provably faster convergence than the classical Frank-Wolfe method. Finally, we show that the faster convergence rates of our variance reduced methods also translate into improved convergence rates for the stochastic setting.
Microarray experiments have been extensively used for simultaneously measuring DNA expression levels of thousands of genes in genome research. A key step in the analysis of gene expression data is the clustering of genes into groups that show similar expression values over a range of conditions. Since only a small subset of the genes participate in any cellular process of interest, by focusing on subsets of genes and conditions, we can lower the noise induced by other genes and conditions -a co-cluster characterizes such a subset of interest. Cheng and Church [3] introduced an effective measure of co-cluster quality based on mean squared residue. In this paper, we use two similar squared residue measures and propose two fast k-means like co-clustering algorithms corresponding to the two residue measures. Our algorithms discover k row clusters and l column clusters simultaneously while monotonically decreasing the respective squared residues. Our co-clustering algorithms inherit the simplicity, efficiency and wide applicability of the k-means algorithm. Minimizing the residues may also be formulated as trace optimization problems that allow us to obtain a spectral relaxation that we use for a principled initialization for our iterative algorithms. We further enhance our algorithms by an incremental local search strategy that helps avoid empty clusters and escape poor local minima. We illustrate co-clustering results on a yeast cell cycle dataset and a human B-cell lymphoma dataset. Our experiments show that our co-clustering algorithms are efficient and are able to discover coherent co-clusters.
Ultimately being motivated by facilitating space-variant blind deconvolution, we present a class of linear transformations, that are expressive enough for space-variant filters, but at the same time especially designed for efficient matrix-vector-multiplications. Successful results on astronomical imaging through atmospheric turbulences and on noisy magnetic resonance images of constantly moving objects demonstrate the practical significance of our approach.
Many shape and image processing tools rely on computation of correspondences between geometric domains. Efficient methods that stably extract "soft" matches in the presence of diverse geometric structures have proven to be valuable for shape retrieval and transfer of labels or semantic information. With these applications in mind, we present an algorithm for probabilistic correspondence that optimizes an entropy-regularized Gromov-Wasserstein (GW) objective. Built upon recent developments in numerical optimal transportation, our algorithm is compact, provably convergent, and applicable to any geometric domain expressible as a metric measure matrix. We provide comprehensive experiments illustrating the convergence and applicability of our algorithm to a variety of graphics tasks. Furthermore, we expand entropic GW correspondence to a framework for other matching problems, incorporating partial distance matrices, user guidance, shape exploration, symmetry detection, and joint analysis of more than two domains. These applications expand the scope of entropic GW correspondence to major shape analysis problems and are stable to distortion and noise.
Abstract. Positive definite matrices abound in a dazzling variety of applications. This ubiquity can be in part attributed to their rich geometric structure: positive definite matrices form a selfdual convex cone whose strict interior is a Riemannian manifold. The manifold view is endowed with a "natural" distance function while the conic view is not. Nevertheless, drawing motivation from the conic view, we introduce the S-Divergence as a "natural" distance-like function on the open cone of positive definite matrices. We motivate the S-divergence via a sequence of results that connect it to the Riemannian distance. In particular, we show that (a) this divergence is the square of a distance; and (b) that it has several geometric properties similar to those of the Riemannian distance, though without being computationally as demanding. The S-Divergence is even more intriguing: although nonconvex, we can still compute matrix means and medians using it to global optimality. We complement our results with some numerical experiments illustrating our theorems and our optimization algorithm for computing matrix medians.Key words. Bregman matrix divergence; Log Determinant; Stein Divergence; Jensen-Bregman divergence; matrix geometric mean; matrix median; nonpositive curvature 1. Introduction. Hermitian positive definite (HPD) matrices are a noncommutative generalization of positive reals. They abound in a multitude of applications and exhibit attractive geometric properties-e.g., they form a differentiable Riemannian (also Finslerian) manifold [10,33] that is a well-studied example of a manifold of nonpositive curvature [17, Ch.10]. HPD matrices possess even more structure: (i) they embody a canonical higher-rank symmetric space [51]; and (ii) their closure forms a closed, self-dual convex cone.The convex conic view enjoys great importance in convex optimization [6,43,44] and in nonlinear Perron-Frobenius theory [40]; symmetric spaces are important in algebra, analysis [32,39,51], and optimization [43,52]; while the manifold view (Riemannian or Finslerian) plays diverse roles-see [10, Ch.6] and [46].The manifold view is equipped with a with a "natural" distance function while the conic view is not. Nevertheless, drawing motivation from the convex conic view, we introduce the S-Divergence as a "natural" distance-like function on the open cone of positive definite matrices. Indeed, we prove a sequence of results connecting the S-Divergence to the Riemannian distance. Most importantly, we show that (a) this divergence is the square of a distance; and (b) that it has several geometric properties in common with the Riemannian distance, without being numerically as demanding. This builds an informal link between the manifold and conic views of HPD matrices.1.1. Background and notation. We begin by fixing notation. The letter H denotes some Hilbert space, usually just C n . The inner product between two vectors x and y in H is x, y := x * y (x * denotes 'conjugate transpose'). The set of n × n Hermitian matrices is denoted as H...
We develop geometric optimisation on the manifold of Hermitian positive definite (HPD) matrices. In particular, we consider optimising two types of cost functions: (i) geodesically convex (g-convex); and (ii) log-nonexpansive (LN). G-convex functions are nonconvex in the usual euclidean sense, but convex along the manifold and thus allow global optimisation. LN functions may fail to be even g-convex, but still remain globally optimisable due to their special structure. We develop theoretical tools to recognise and generate g-convex functions as well as cone theoretic fixed-point optimisation algorithms. We illustrate our techniques by applying them to maximum-likelihood parameter estimation for elliptically contoured distributions (a rich class that substantially generalises the multivariate normal distribution). We compare our fixed-point algorithms with sophisticated manifold optimisation methods and obtain notable speedups. 1 To our knowledge the name "geometric optimisation" has not been previously attached to g-convex and cone theoretic HPD matrix optimisation, though several scattered examples do exist. Our theorems offer a formal starting point for recognising HPD geometric optimisation problems. arXiv:1312.1039v3 [math.FA] 12 Dec 2014 2 programming has enjoyed great success across a spectrum of applications-see e.g., the survey of Boyd et al. [11]; we hope this paper helps conic geometric optimisation gain wider exposure.Perhaps the best known conic geometric optimisation problem is computation of the Karcher (Fréchet) mean of a set of HPD matrices, a topic that has attracted great attention within matrix theory [7,8,25,48], computer vision [16], radar imaging [41, Part II], medical imaging [17, 52]-we refer the reader to the recent book [41] for additional applications and references. Another basic geometric optimisation problem arises as a subroutine in image search and matrix clustering [18].Conic geometric optimisation problems also occur in several other areas: statistics (covariance shrinkage) [15], nonlinear matrix equations [31], Markov decision processes and more broadly in the fascinating areas of nonlinear Perron-Frobenius theory [32].As a concrete illustration of our ideas, we discuss the task of maximum likelihood estimate (mle) for elliptically contoured distributions (ECDs) [13,21,37]-see §5. We use ECDs to illustrate our theory, not only because of their instructive value but also because of their importance in a variety of applications [42]. OutlineThe main focus of this paper is on recognising and constructing certain structured nonconvex functions of HPD matrices. In particular, Section 2 studies the class of geodesically convex functions, while Section 4 introduces "log-nonexpansive" functions. We present a limited-memory BFGS algorithm in Section 3, where we also present a derivation for the parallel transport, which, we could not find elsewhere in the literature. Even though manifold optimisation algorithms apply to both classes of functions, for log-nonexpansive functions we advance...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.