Given two distributions over an n element set, we wish to check whether these distributions are statistically close by only sampling. We give a sublinear algorithm which uses O(n 2/3 −4 log n) independent samples from each distribution, runs in time linear in the sample size, makes no assumptions about the structure of the distributions, and distinguishes the cases when the distance between the distributions is small (less than max(We also give an Ω(n 2/3 −2/3 ) lower bound. Our algorithm has applications to the problem of checking whether a given Markov process is rapidly mixing. We develop sublinear algorithms for this problem as well. * A preliminary version of this
Given samples from two distributions over an n-element set, we wish to test whether these distributions are statistically close. We present an algorithm which uses sublinear in n, specifically, O(n 2/3 ǫ −8/3 log n), independent samples from each distribution, runs in time linear in the sample size, makes no assumptions about the structure of the distributions, and distinguishes the cases when the distance between the distributions is small (less than max{ǫ 4/3 n −1/3 /32, ǫn −1/2 /4}) or large (more than ǫ) in ℓ 1 distance. This result can be compared to the lower bound of Ω(n 2/3 ǫ −2/3 ) for this problem given by Valiant [54]. Our algorithm has applications to the problem of testing whether a given Markov process is rapidly mixing. We present sublinear algorithms for several variants of this problem as well.
The complexity of testing properties of monotone and unimodal distributions, when given access only to samples of the distribution, is investigated. Two kinds of sublineartime algorithms-those for testing monotonicity and those that take advantage of monotonicity-are provided.The first algorithm tests if a given distribution on [n] is monotone or far away from any monotone distribution in L1-norm; this algorithm usesÕ( √ n) samples and is shown to be nearly optimal. The next algorithm, given a joint distribution on [n] × [n], tests if it is monotone or is far away from any monotone distribution in L1-norm; this algorithm usesÕ(n 3/2 ) samples. The problems of testing if two monotone distributions are close in L1-norm and if two random variables with a monotone joint distribution are close to being independent in L1-norm are also considered. Algorithms for these problems that use only poly(log n) samples are presented. The closeness and independence testing algorithms for monotone distributions are significantly more efficient than the corresponding algorithms as well as the lower bounds for arbitrary distributions.Some of the above results are also extended to unimodal distributions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.