We introduce the method of Geodesic Principal Component Analysis (GPCA) on the space of probability measures on the line, with finite second moment, endowed with the Wasserstein metric. We discuss the advantages of this approach, over a standard functional PCA of probability densities in the Hilbert space of square-integrable functions. We establish the consistency of the method by showing that the empirical GPCA converges to its population counterpart, as the sample size tends to infinity. A key property in the study of GPCA is the isometry between the Wasserstein space and a closed convex subset of the space of square-integrable functions, with respect to an appropriate measure. Therefore, we consider the general problem of PCA in a closed convex subset of a separable Hilbert space, which serves as basis for the analysis of GPCA and also has interest in its own right. We provide illustrative examples on simple statistical models, to show the benefits of this approach for data analysis. The method is also applied to a real dataset of population pyramids.
We prove strong convergence of the proportions Un/Tn of balls in a multitype generalized Pólya urn model, using martingale arguments. The limit is characterized as a convex combination of left dominant eigenvectors of the replacement matrix R, with random Dirichlet coefficients.
We prove strong convergence of the proportions Un
/Tn
of balls in a multitype generalized Pólya urn model, using martingale arguments. The limit is characterized as a convex combination of left dominant eigenvectors of the replacement matrix R, with random Dirichlet coefficients.
This paper is focused on the statistical analysis of probability measures ν 1 , . . . , ν n on R that can be viewed as independent realizations of an underlying stochastic process. We consider the situation of practical importance where the random measures ν i are absolutely continuous with densities f i that are not directly observable. In this case, instead of the densities, we have access to datasets of real random variables (X i,j ) 1≤i≤n; 1≤j≤pi organized in the form of n experimental units, such that X i,1 , . . . , X i,pi are iid observations sampled from a random measure ν i for each 1 ≤ i ≤ n. In this setting, we focus on first-order statistics methods for estimating, from such data, a meaningful structural mean measure. For the purpose of taking into account phase and amplitude variations in the observations, we argue that the notion of Wasserstein barycenter is a relevant tool. The main contribution of this paper is to characterize the rate of convergence of a (possibly smoothed) empirical Wasserstein barycenter towards its population counterpart in the asymptotic setting where both n and min 1≤i≤n p i may go to infinity. The optimality of this procedure is discussed from the minimax point of view with respect to the Wasserstein metric. We also highlight the connection between our approach and the curve registration problem in statistics. Some numerical experiments are used to illustrate the results of the paper on the convergence rate of empirical Wasserstein barycenters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.