Estimating mutual information from i.i.d. samples drawn from an unknown joint density function is a basic statistical problem of broad interest with multitudinous applications. The most popular estimator is one proposed by Kraskov and Stögbauer and Grassberger (KSG) in 2004, and is nonparametric and based on the distances of each sample to its k th nearest neighboring sample, where k is a fixed small integer. Despite its widespread use (part of scientific software packages), theoretical properties of this estimator have been largely unexplored. In this paper we demonstrate that the estimator is consistent and also identify an upper bound on the rate of convergence of the 2 error as a function of number of samples. We argue that the performance benefits of the KSG estimator stems from a curious "correlation boosting" effect and build on this intuition to modify the KSG estimator in novel ways to construct a superior estimator. As a byproduct of our investigations, we obtain nearly tight rates of convergence of the 2 error of the well known fixed k nearest neighbor estimator of differential entropy significantly better than alternative approaches, discussed in detail in Section 6, both in simulations and when tested in the wild; this is especially true when the random variables are in high dimensions.The exemplar fixed k-NN estimator is that of differential entropy from i.i.d. samples proposed in 1987 by Kozachenko and Leonenko [27] which involved a novel bias correction term, and we refer to as the KL estimator (of differential entropy). Since the mutual information between two random variables is the sum and difference of three differential entropy terms, any estimator of differential entropy naturally lends itself into an estimator of mutual information, which we christen as the 3KL estimator (of mutual information). In an inspired work in 2004, Kraskov and Stögbauer and Grassberger [29], proposed a different fixed k-NN estimator of the mutual information, which we name the KSG estimator, that involved subtle (sample dependent) alterations to the 3KL estimator. The authors of [29,25] empirically demonstrated that the KSG estimator consistently improves over the 3KL estimator in a variety of settings. Indeed, the simplicity of the KSG estimator, combined with its superior performance, has made it a very popular estimator of mutual information in practice.Despite its widespread use, even basic theoretical properties of the KSG estimator are unknown -it is not even clear if the estimator has vanishing bias (i.e., consistent) as the number of samples grows, much less any understanding of the asymptotic behavior of the bias as a function of the number of samples. As observed elsewhere [17], characterizing the theoretical properties of the KSG estimator is of first order importancethis study could shed light on why the sample-dependent modifications lead to improved performance and perhaps this understanding could lead to the design of even better mutual information estimators. Such are the goals of this paper.Main ...
Estimators of information theoretic measures such as entropy and mutual information are a basic workhorse for many downstream applications in modern data science. State of the art approaches have been either geometric (nearest neighbor (NN) based) or kernel based (with a globally chosen bandwidth). In this paper, we combine both these approaches to design new estimators of entropy and mutual information that outperform state of the art methods. Our estimator uses local bandwidth choices of k-NN distances with a finite k, independent of the sample size. Such a local and data dependent choice improves performance in practice, but the bandwidth is vanishing at a fast rate, leading to a non-vanishing bias. We show that the asymptotic bias of the proposed estimator is universal; it is independent of the underlying distribution. Hence, it can be precomputed and subtracted from the estimate. As a byproduct, we obtain a unified way of obtaining both kernel and NN estimators. The corresponding theoretical contribution relating the asymptotic geometry of nearest neighbors to order statistics is of independent mathematical interest.
Discovering a correlation from one variable to another variable is of fundamental scientific and practical interest. While existing correlation measures are suitable for discovering average correlation, they fail to discover hidden or potential correlations. To bridge this gap, (i) we postulate a set of natural axioms that we expect a measure of potential correlation to satisfy; (ii) we show that the rate of information bottleneck, i.e., the hypercontractivity coefficient, satisfies all the proposed axioms; (iii) we provide a novel estimator to estimate the hypercontractivity coefficient from samples; and (iv) we provide numerical experiments demonstrating that this proposed estimator discovers potential correlations among various indicators of WHO datasets, is robust in discovering gene interactions from gene expression time series data, and is statistically more powerful than the estimators for other correlation measures in binary hypothesis testing of canonical examples of potential correlations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.