We propose a general statistical inference framework to capture the privacy threat incurred by a user that releases data to a passive but curious adversary, given utility constraints. We show that applying this general framework to the setting where the adversary uses the self-information cost function naturally leads to a nonasymptotic information-theoretic approach for characterizing the best achievable privacy subject to utility constraints. Based on these results we introduce two privacy metrics, namely average information leakage and maximum information leakage. We prove that under both metrics the resulting design problem of finding the optimal mapping from the user's data to a privacy-preserving output can be cast as a modified rate-distortion problem which, in turn, can be formulated as a convex program. Finally, we compare our framework with differential privacy. A. Motivation B. Contributions average information leakage maximum information leakage
We introduce a tunable measure for information leakage called maximal α-leakage. This measure quantifies the maximal gain of an adversary in inferring any (potentially random) function of a dataset from a release of the data. The inferential capability of the adversary is, in turn, quantified by a class of adversarial loss functions that we introduce as αloss, α ∈ [1, ∞) ∪ {∞}. The choice of α determines the specific adversarial action and ranges from refining a belief (about any function of the data) for α = 1 to guessing the most likely value for α = ∞ while refining the α th moment of the belief for α in between. Maximal α-leakage then quantifies the adversarial gain under α-loss over all possible functions of the data. In particular, for the extremal values of α = 1 and α = ∞, maximal αleakage simplifies to mutual information and maximal leakage, respectively. For α ∈ (1, ∞) this measure is shown to be the Arimoto channel capacity of order α. We show that maximal αleakage satisfies data processing inequalities and a sub-additivity property thereby allowing for a weak composition result. Building upon these properties, we use maximal α-leakage as the privacy measure and study the problem of data publishing with privacy guarantees, wherein the utility of the released data is ensured via a hard distortion constraint. Unlike average distortion, hard distortion provides a deterministic guarantee of fidelity. We show that under a hard distortion constraint, for α > 1 the optimal mechanism is independent of α, and therefore, the resulting optimal tradeoff is the same for all values of α > 1. Finally, the tunability of maximal α-leakage as a privacy measure is also illustrated for binary data with average Hamming distortion as the utility measure.
Abstract-"To be considered for an 2015 IEEE Jack Keil Wolf ISIT Student Paper Award." We investigate the problem of intentionally disclosing information about a set of measurement points X (useful information), while guaranteeing that little or no information is revealed about a private variable S (private information). Given that S and X are drawn from a finite set with joint distribution pS,X , we prove that a non-trivial amount of useful information can be disclosed while not disclosing any private information if and only if the smallest principal inertia component of the joint distribution of S and X is 0. This fundamental result characterizes when useful information can be privately disclosed for any privacy metric based on statistical dependence. We derive sharp bounds for the tradeoff between disclosure of useful and private information, and provide explicit constructions of privacy-assuring mappings that achieve these bounds.
A tunable measure for information leakage called maximal α-leakage is introduced. This measure quantifies the maximal gain of an adversary in refining a tilted version of its prior belief of any (potentially random) function of a dataset conditioning on a disclosed dataset. The choice of α determines the specific adversarial action ranging from refining a belief for α = 1 to guessing the best posterior for α = ∞, and for these extremal values this measure simplifies to mutual information (MI) and maximal leakage (MaxL), respectively. For all other α this measure is shown to be the Arimoto channel capacity. Several properties of this measure are proven including: (i) quasiconvexity in the mapping between the original and disclosed datasets; (ii) data processing inequalities; and (iii) a composition property.
We explore properties and applications of the Principal Inertia Components (PICs) between two discrete random variables X and Y . The PICs lie in the intersection of information and estimation theory, and provide a fine-grained decomposition of the dependence between X and Y . Moreover, the PICs describe which functions of X can or cannot be reliably inferred (in terms of MMSE) given an observation of Y . We demonstrate that the PICs play an important role in information theory, and they can be used to characterize information-theoretic limits of certain estimation problems. In privacy settings, we prove that the PICs are related to fundamental limits of perfect privacy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.