We adopt a utilitarian perspective on social choice, assuming that agents have (possibly latent) utility functions over some space of alternatives. For many reasons one might consider mechanisms, or social choice functions, that only have access to the ordinal rankings of alternatives by the individual agents rather than their utility functions. In this context, one possible objective for a social choice function is the maximization of (expected) social welfare relative to the information contained in these rankings. We study such optimal social choice functions under three different models, and underscore the important role played by scoring functions. In our worst-case model, no assumptions are made about the underlying distribution and we analyze the worst-case distortion-or degree to which the selected alternative does not maximize social welfare-of optimal social choice functions. In our average-case model, we derive optimal functions under neutral (or impartial culture) distributional models. Finally, a very general learning-theoretic model allows for the computation of optimal social choice functions (i.e., that maximize expected social welfare) under arbitrary, sampleable distributions. In the latter case, we provide both algorithms and sample complexity results for the class of scoring functions, and further validate the approach empirically.
This paper proves that an "old dog", namely -the classical Johnson-Lindenstrauss transform, "performs new tricks" -it gives a novel way of preserving differential privacy. We show that if we take two databases, D and with a vector of iid normal Gaussians yields two statistically close distributions in the sense of differential privacy. Furthermore, a small, deterministic and public alteration of the input is enough to assert that all singular values of D are large. We apply the Johnson-Lindenstrauss transform to the task of approximating cut-queries: the number of edges crossing a (S,S)-cut in a graph. We show that the JL transform allows us to publish a sanitized graph that preserves edge differential privacy (where two graphs are neighbors if they differ on a single edge) while adding only O(|S|/ǫ) random noise to any given query (w.h.p). Comparing the additive noise of our algorithm to existing algorithms for answering cut-queries in a differentially private manner, we outperform all others on small cuts (|S| = o(n)).We also apply our technique to the task of estimating the variance of a given matrix in any given direction. The JL transform allows us to publish a sanitized covariance matrix that preserves differential privacy w.r.t bounded changes (each row in the matrix can change by at most a norm-1 vector) while adding random noise of magnitude independent of the size of the matrix (w.h.p). In contrast, existing algorithms introduce an error which depends on the matrix dimensions.
We introduce the notion of restricted sensitivity as an alternative to global and smooth sensitivity to improve accuracy in differentially private data analysis. The definition of restricted sensitivity is similar to that of global sensitivity except that instead of quantifying over all possible datasets, we take advantage of any beliefs about the dataset that a querier may have, to quantify over a restricted class of datasets. Specifically, given a query f and a hypothesis H about the structure of a dataset D, we show generically how to transform f into a new query f H whose global sensitivity (over all datasets including those that do not satisfy H) matches the restricted sensitivity of the query f . Moreover, if the belief of the querier is correct (i.e., D ∈ H) then f H (D) = f (D). If the belief is incorrect, then f H (D) may be inaccurate.We demonstrate the usefulness of this notion by considering the task of answering queries regarding social-networks, which we model as a combination of a graph and a labeling of its vertices. In particular, while our generic procedure is computationally inefficient, for the specific definition of H as graphs of bounded degree, we exhibit efficient ways of constructing f H using different projection-based techniques. We then analyze two important query classes: subgraph counting queries (e.g., number of triangles) and local profile queries (e.g., number of people who know a spy and a computer-scientist who know each other). We demonstrate that the restricted sensitivity of such queries can be significantly lower than their smooth sensitivity. Thus, using restricted sensitivity we can maintain privacy whether or not D ∈ H, while providing more accurate results in the event that H holds true. *
Clustering under most popular objective functions is NP-hard, even to approximate well, and so unlikely to be efficiently solvable in the worst case. Recently, Bilu and Linial [11] suggested an approach aimed at bypassing this computational barrier by using properties of instances one might hope to hold in practice. In particular, they argue that instances in practice should be stable to small perturbations in the metric space and give an efficient algorithm for clustering instances of the Max-Cut problem that are stable to perturbations of size O(n 1/2 ). In addition, they conjecture that instances stable to as little as O(1) perturbations should be solvable in polynomial time. In this paper we prove that this conjecture is true for any center-based clustering objective (such as k-median, k-means, and k-center). Specifically, we show we can efficiently find the optimal clustering assuming only stability to factor-3 perturbations of the underlying metric in spaces without Steiner points, and stability to factor 2 + √ 3 perturbations for general metrics. In particular, we show for such instances that the popular Single-Linkage algorithm combined with dynamic programming will find the optimal clustering. We also present NP-hardness results under a weaker but related condition.
Aiming to unify known results about clustering mixtures of distributions under separation conditions, Kumar and Kannan [KK10] introduced a deterministic condition for clustering datasets. They showed that this single deterministic condition encompasses many previously studied clustering assumptions. More specifically, their proximity condition requires that in the target k-clustering, the projection of a point x onto the line joining its cluster center µ and some other center µ ′ , is a large additive factor closer to µ than to µ ′ . This additive factor can be roughly described as k times the spectral norm of the matrix representing the differences between the given (known) dataset and the means of the (unknown) target clustering. Clearly, the proximity condition implies center separation -the distance between any two centers must be as large as the above mentioned bound.In this paper we improve upon the work of Kumar and Kannan [KK10] along several axes. First, we weaken the center separation bound by a factor of √ k, and secondly we weaken the proximity condition by a factor of k (in other words, the revised separation condition is independent of k). Using these weaker bounds we still achieve the same guarantees when all points satisfy the proximity condition. Under the same weaker bounds, we achieve even better guarantees when only (1−ǫ)-fraction of the points satisfy the condition. Specifically, we correctly cluster all but a (ǫ + O(1/c 4 ))-fraction of the points, compared to O(k 2 ǫ)-fraction of [KK10], which is meaningful even in the particular setting when ǫ is a constant and k = ω(1). Most importantly, we greatly simplify the analysis of Kumar and Kannan. In fact, in the bulk of our analysis we ignore the proximity condition and use only center separation, along with the simple triangle and Markov inequalities. Yet these basic tools suffice to produce a clustering which (i) is correct on all but a constant fraction of the points, (ii) has k-means cost comparable to the k-means cost of the target clustering, and (iii) has centers very close to the target centers.Our improved separation condition allows us to match the results of the Planted Partition Model of McSherry [McS01], improve upon the results of Ostrovsky et al [ORSS06], and improve separation results for mixture of Gaussian models in a particular setting.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.