We introduce a tunable measure for information leakage called maximal α-leakage. This measure quantifies the maximal gain of an adversary in inferring any (potentially random) function of a dataset from a release of the data. The inferential capability of the adversary is, in turn, quantified by a class of adversarial loss functions that we introduce as αloss, α ∈ [1, ∞) ∪ {∞}. The choice of α determines the specific adversarial action and ranges from refining a belief (about any function of the data) for α = 1 to guessing the most likely value for α = ∞ while refining the α th moment of the belief for α in between. Maximal α-leakage then quantifies the adversarial gain under α-loss over all possible functions of the data. In particular, for the extremal values of α = 1 and α = ∞, maximal αleakage simplifies to mutual information and maximal leakage, respectively. For α ∈ (1, ∞) this measure is shown to be the Arimoto channel capacity of order α. We show that maximal αleakage satisfies data processing inequalities and a sub-additivity property thereby allowing for a weak composition result. Building upon these properties, we use maximal α-leakage as the privacy measure and study the problem of data publishing with privacy guarantees, wherein the utility of the released data is ensured via a hard distortion constraint. Unlike average distortion, hard distortion provides a deterministic guarantee of fidelity. We show that under a hard distortion constraint, for α > 1 the optimal mechanism is independent of α, and therefore, the resulting optimal tradeoff is the same for all values of α > 1. Finally, the tunability of maximal α-leakage as a privacy measure is also illustrated for binary data with average Hamming distortion as the utility measure.
A tunable measure for information leakage called maximal α-leakage is introduced. This measure quantifies the maximal gain of an adversary in refining a tilted version of its prior belief of any (potentially random) function of a dataset conditioning on a disclosed dataset. The choice of α determines the specific adversarial action ranging from refining a belief for α = 1 to guessing the best posterior for α = ∞, and for these extremal values this measure simplifies to mutual information (MI) and maximal leakage (MaxL), respectively. For all other α this measure is shown to be the Arimoto channel capacity. Several properties of this measure are proven including: (i) quasiconvexity in the mapping between the original and disclosed datasets; (ii) data processing inequalities; and (iii) a composition property.
Abstract-The problem of publishing privacy-guaranteed data for hypothesis testing is studied using the maximal leakage (ML) as a metric for privacy and the type-II error exponent as the utility metric. The optimal mechanism (random mapping) that maximizes utility for a bounded leakage guarantee is determined for the entire leakage range for binary datasets. For non-binary datasets, approximations in the high privacy and high utility regimes are developed. The results show that, for any desired leakage level, maximizing utility forces the ML privacy mechanism to reveal partial to complete knowledge about a subset of the source alphabet. The results developed on maximizing a convex function over a polytope may also of an independent interest.
Abstract-Hypothesis testing is a statistical inference framework for determining the true distribution among a set of possible distributions for a given dataset. Privacy restrictions may require the curator of the data or the respondents themselves to share data with the test only after applying a randomizing privacy mechanism. This work considers mutual information (MI) as the privacy metric for measuring leakage. In addition, motivated by the Chernoff-Stein lemma, the relative entropy between pairs of distributions of the output (generated by the privacy mechanism) is chosen as the utility metric. For these metrics, the goal is to find the optimal privacy-utility trade-off (PUT) and the corresponding optimal privacy mechanism for both binary and m-ary hypothesis testing. Focusing on the high privacy regime, Euclidean information-theoretic approximations of the binary and m-ary PUT problems are developed. The solutions for the approximation problems clarify that an MI-based privacy metric preserves the privacy of the source symbols in inverse proportion to their likelihoods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.