We give algorithms for geometric graph problems in the modern parallel models such as MapReduce. For example, for the Minimum Spanning Tree (MST) problem over a set of points in the two-dimensional space, our algorithm computes a (1 + )-approximate MST. Our algorithms work in a constant number of rounds of communication, while using total space and communication proportional to the size of the data (linear space and near linear time algorithms). In contrast, for general graphs, achieving the same result for MST (or even connectivity) remains a challenging open problem [9], despite drawing significant attention in recent years.We develop a general algorithmic framework that, besides MST, also applies to Earth-Mover Distance (EMD) and the transportation cost problem. Our algorithmic framework has implications beyond the MapReduce model. For example it yields a new algorithm for computing EMD cost in the plane in near-linear time, n 1+o (1) . We note that while recently [33] have developed a near-linear time algorithm for (1 + )-approximating EMD, our algorithm is fundamentally different, and, for example, also solves the transportation (cost) problem, raised as an open question in [33]. Furthermore, our algorithm immediately gives a (1+ )-approximation algorithm with n δ space in the streamingwith-sorting model with 1/δ O(1) passes. As such, it is tempting to conjecture that the parallel models may also constitute a concrete playground in the quest for efficient algorithms for EMD (and other similar problems) in the vanilla streaming model, a well-known open problem.
In this work, we study trade-offs between accuracy and privacy in the context of linear queries over histograms. This is a rich class of queries that includes contingency tables and range queries, and has been a focus of a long line of work [BLR08,RR10,DRV10,HT10,HR10,LHR + 10,BDKT12]. For a given set of d linear queries over a database x ∈ R N , we seek to find the differentially private mechanism that has the minimum mean squared error. For pure differential privacy, [HT10,BDKT12] give an O(log 2 d) approximation to the optimal mechanism. Our first contribution is to give an O(log 2 d) approximation guarantee for the case of (ε, δ)-differential privacy. Our mechanism is simple, efficient and adds carefully chosen correlated Gaussian noise to the answers. We prove its approximation guarantee relative to the hereditary discrepancy lower bound of [MN12], using tools from convex geometry.We next consider this question in the case when the number of queries exceeds the number of individuals in the database, i.e. when d > n x 1. The lower bounds used in the previous approximation algorithm no longer apply, and in fact better mechanisms are known in this setting [BLR08,RR10,HR10,GHRU11,GRU12]. Our second main contribution is to give an (ε, δ)-differentially private mechanism that for a given query set A and an upper bound n on x 1, has mean squared error within polylog(d, N ) of the optimal for A and n. This approximation is achieved by coupling the Gaussian noise addition approach with linear regression over the 1 ball. Additionally, we show a similar polylogarithmic approximation guarantee for the best ε-differentially private mechanism in this sparse setting. Our work also shows that for arbitrary counting queries, i.e. A with entries in {0, 1}, there is an ε-differentially private mechanism with expected errorÕ( √ n) per query, improving on theÕ(n 2 3 ) bound of [BLR08], and matching the lower bound implied by [DN03] up to logarithmic factors. The connection between hereditary discrepancy and the privacy mechanism enables us to derive the first polylogarithmic approximation to the hereditary discrepancy of a matrix the database. We can represent the database as its histogram x ∈ R N with x i denoting the number of occurrences of the ith element of the universe. Thus x would in fact be a vector of non-negative integers with x 1 = n. We will be concerned with reporting reasonably accurate answers to a given set of d linear queries over this histogram x. This set of queries can naturally be represented by a matrix A ∈ R d×N with the vector Ax ∈ R d giving the correct answers to the queries. When A ∈ {0, 1} d×N , we call such queries counting queries. We are interested in the (practical) regime where N d n, although our results hold for all settings of the parameters.A differentially private mechanism will return a noisy answer to the query A and, in this work, we measure the performance of the mechanisms in terms of its worst case total expected squared error. Suppose that X ⊆ R N is the set of all possible databases...
The maximum volume j-simplex problem asks to compute the j-dimensional simplex of maximum volume inside the convex hull of a given set of n points in Q d . We give a deterministic approximation algorithm for this problem which achieves an approximation ratio of e j/2+o(j) . The problem is known to be NP-hard to approximate within a factor of c j for some constant c > 1. Our algorithm also gives a factor e j+o(j) approximation for the problem of finding the principal j × j submatrix of a rank d positive semidefinite matrix with the largest determinant. We achieve our approximation by rounding solutions to a generalization of the D-optimal design problem, or, equivalently, the dual of an appropriate smallest enclosing ellipsoid problem. Our arguments give a short and simple proof of a restricted invertibility principle for determinants.
No abstract
The γ 2 norm of a real m × n matrix A is the minimum number t such that the column vectors of A are contained in a 0-centered ellipsoid E ⊆ R m which in turn is contained in the hypercube [−t, t] m . We prove that this classical quantity approximates the hereditary discrepancy herdisc A as follows: γ 2 (A) = O(log m)·herdisc A and herdisc A = O( √ log m )·γ 2 (A). Since γ 2 is polynomial-time computable, this gives a polynomial-time approximation algorithm for hereditary discrepancy. Both inequalities are shown to be asymptotically tight.We then demonstrate on several examples the power of the γ 2 norm as a tool for proving lower and upper bounds in discrepancy theory. Most notably, we prove a new lower bound of Ω(log d−1 n) for the d-dimensional Tusnády problem, asking for the combinatorial discrepancy of an n-point set in R d with respect to axis-parallel boxes. For d > 2, this improves the previous best lower bound, which was of order approximately log (d−1)/2 n, and it comes close to the best known upper bound of O(log d+1/2 n), for which we also obtain a new, very simple proof.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.