2021
DOI: 10.48550/arxiv.2109.11429
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Robin Hood and Matthew Effects -- Differential Privacy Has Disparate Impact on Synthetic Data

Abstract: Generative models trained using Differential Privacy (DP) are increasingly used to produce and share synthetic data in a privacy-friendly manner. In this paper, we set out to analyze the impact of DP on these models vis-à-vis underrepresented classes and subgroups of data. We do so from two angles: 1) the size of classes and subgroups in the synthetic data, and 2) classification accuracy on them. We also evaluate the effect of various levels of imbalance and privacy budgets.Our experiments, conducted using thr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(8 citation statements)
references
References 19 publications
0
7
0
Order By: Relevance
“…First, we note that with increased privacy PATE-GAN, again, has much lower variability/spread in terms of size and a smaller drop in terms of recall. We also clearly observe the opposing size effects the two generative models exhibit, similarly to [14] -DP-WGAN makes the classes more uniform, i.e., large classes are reduced, and small classes are increased, while PATE-GAN further enforces the imbalance, large classes become even bigger.…”
Section: Mixed Class Resultsmentioning
confidence: 83%
See 4 more Smart Citations
“…First, we note that with increased privacy PATE-GAN, again, has much lower variability/spread in terms of size and a smaller drop in terms of recall. We also clearly observe the opposing size effects the two generative models exhibit, similarly to [14] -DP-WGAN makes the classes more uniform, i.e., large classes are reduced, and small classes are increased, while PATE-GAN further enforces the imbalance, large classes become even bigger.…”
Section: Mixed Class Resultsmentioning
confidence: 83%
“…We experiment with privacy budgets ( ) of 0.5, 5, 15, and infinity ("non-DP"). We measure the class distributions in the resulting synthetic datasets as well as class recall from classifiers (logistic regression similar to [14]) trained on the real/synthetic data and tested on put-aside test data. We also report RMSE for sizes and truncated 2 RMSE (TRMSE) for recall weighted by the real sizes in App.…”
Section: Evaluation Methodologymentioning
confidence: 99%
See 3 more Smart Citations