In this paper, we tackle the problem of constructing a differentially private synopsis for two-dimensional datasets such as geospatial datasets. The current state-of-the-art methods work by performing recursive binary partitioning of the data domains, and constructing a hierarchy of partitions. We show that the key challenge in partition-based synopsis methods lies in choosing the right partition granularity to balance the noise error and the nonuniformity error. We study the uniform-grid approach, which applies an equi-width grid of a certain size over the data domain and then issues independent count queries on the grid cells. This method has received no attention in the literature, probably due to the fact that no good method for choosing a grid size was known. Based on an analysis of the two kinds of errors, we propose a method for choosing the grid size. Experimental results validate our method, and show that this approach performs as well as, and often times better than, the state-of-the-art methods.We further introduce a novel adaptive-grid method. The adaptive grid method lays a coarse-grained grid over the dataset, and then further partitions each cell according to its noisy count. Both levels of partitions are then used in answering queries over the dataset. This method exploits the need to have finer granularity partitioning over dense regions and, at the same time, coarse partitioning over sparse regions. Through extensive experiments on real-world datasets, we show that this approach consistently and significantly outperforms the uniformgrid method and other state-of-the-art methods.
In recent years, many approaches to differentially privately publish histograms have been proposed. Several approaches rely on constructing tree structures in order to decrease the error when answer large range queries. In this paper, we examine the factors affecting the accuracy of hierarchical approaches by studying the mean squared error (MSE) when answering range queries. We start with one-dimensional histograms, and analyze how the MSE changes with different branching factors, after employing constrained inference, and with different methods to allocate the privacy budget among hierarchy levels. Our analysis and experimental results show that combining the choice of a good branching factor with constrained inference outperform the current state of the art. Finally, we extend our analysis to multidimensional histograms. We show that the benefits from employing hierarchical methods beyond a single dimension are significantly diminished, and when there are 3 or more dimensions, it is almost always better to use the Flat method instead of a hierarchy.
This paper aims at answering the following two questions in privacy-preserving data analysis and publishing: What formal privacy guarantee (if any) does k-anonymization provide? How to benefit from the adversary's uncertainty about the data? We have found that random sampling provides a connection that helps answer these two questions, as sampling can create uncertainty. The main result of the paper is that k-anonymization, when done "safely", and when preceded with a random sampling step, satisfies (ǫ, δ)-differential privacy with reasonable parameters. This result illustrates that "hiding in a crowd of k" indeed offers some privacy guarantees. This result also suggests an alternative approach to output perturbation for satisfying differential privacy: namely, adding a random sampling step in the beginning and pruning results that are too sensitive to change of a single tuple. Regarding the second question, we provide both positive and negative results. On the positive side, we show that adding a random-sampling pre-processing step to a differentially-private algorithm can greatly amplify the level of privacy protection. Hence, when given a dataset resulted from sampling, one can utilize a much large privacy budget. On the negative side, any privacy notion that takes advantage of the adversary's uncertainty likely does not compose. We discuss what these results imply in practice.
Smart electric meters pose a substantial threat to the privacy of individuals in their own homes. Combined with non-intrusive load monitors, smart meter data can reveal precise home appliance usage information. An emerging solution to behavior leakage in smart meter measurement data is the use of battery-based load hiding. In this approach, a battery is used to store and supply power to home devices at strategic times to hide appliance loads from smart meters. A few such battery control algorithms have already been studied in the literature, but none have been evaluated from an adversarial point of view. In this paper, we first consider two well known battery privacy algorithms, Best Effort (BE) and Non-Intrusive Load Leveling (NILL), and demonstrate attacks that recover precise load change information, which can be used to recover appliance behavior information, under both algorithms. We then introduce a stepping approach to battery privacy algorithms that fundamentally differs from previous approaches by maximizing the error between the load demanded by a home and the external load seen by a smart meter. By design, precise load change recovery attacks are impossible. We also propose mutual-information based measurements to evaluate the privacy of different algorithms. We implement and evaluate four novel algorithms using the stepping approach, and show that under the mutual-information metrics they outperform BE and NILL.
No abstract
The User Authorization Query (UAQ) Problem for RBAC, introduced by Zhang and Joshi [9], is to determine the set of roles to be activated in a single session for a particular set of permissions requested by the user. This set of roles must satisfy constraints that prevent certain combinations of roles to be activated in one session, and should follow the least privilege principle. We show that the existing approach to the UAQ problem is inadequate, and propose two approaches for solving the UAQ problem. In the first approach, we develop algorithms that use the backtracking-based search techniques developed in the artificial intelligence community. In the second approach, we reduce the problem to the MAXSAT problem which can be solved using available SAT solvers. We have implemented both approaches and experimentally evaluated them.
We introduce a novel privacy framework that we call Membership Privacy. The framework includes positive membership privacy, which prevents the adversary from significantly increasing its ability to conclude that an entity is in the input dataset, and negative membership privacy, which prevents leaking of non-membership. These notions are parameterized by a family of distributions that captures the adversary's prior knowledge. The power and flexibility of the proposed framework lies in the ability to choose different distribution families to instantiate membership privacy. Many privacy notions in the literature are equivalent to membership privacy with interesting distribution families, including differential privacy, differential identifiability, and differential privacy under sampling. Casting these notions into the framework leads to deeper understanding of the strengthes and weaknesses of these notions, as well as their relationships to each other. The framework also provides a principled approach to developing new privacy notions under which better utility can be achieved than what is possible under differential privacy.
We consider the problem of publishing a differentially private synopsis of a d-dimensional dataset so that one can reconstruct any k-way marginal contingency tables from the synopsis. Marginal tables are the workhorses of categorical data analysis. Thus, the private release of such tables has attracted a lot of attention from the research community. However, for situations where d is moderate to large and k is beyond 3, no accurate and practical method exists. We introduce PriView, which computes marginal tables for a number of strategically chosen sets of attributes that we call views, and then use these view marginal tables to reconstruct any desired k-way marginal. In PriView, we apply maximum entropy optimization to reconstruct k-way marginals from views. We also develop a novel method to efficiently making all view marginals consistent while correcting negative entries to improve accuracy. For view selection, we borrow the concept of covering design from combinatorics theory. We conduct extensive experiments on real and synthetic datasets, and show that PriView reduces the error over existing approaches by 2 to 3 orders of magnitude.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.