Proceedings 2022 Network and Distributed System Security Symposium 2022
DOI: 10.14722/ndss.2022.24092
|View full text |Cite
|
Sign up to set email alerts
|

On Utility and Privacy in Synthetic Genomic Data

Abstract: The availability of genomic data is essential to progress in biomedical research, personalized medicine, etc. However, its extreme sensitivity makes it problematic, if not outright impossible, to publish or share it. As a result, several initiatives have been launched to experiment with synthetic genomic data, e.g., using generative models to learn the underlying distribution of the real data and generate artificial datasets that preserve its salient characteristics without exposing it.This paper provides the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 52 publications
(75 reference statements)
0
3
0
Order By: Relevance
“…A recent study by Oprisanu et al. [ 103 ] compared the above synthetic data methods for utility and privacy. They showed that recombination-based methods have high utility but low privacy, while RBMs offer a trade-off.…”
Section: Genomic Data Privacy and Securitymentioning
confidence: 99%
“…A recent study by Oprisanu et al. [ 103 ] compared the above synthetic data methods for utility and privacy. They showed that recombination-based methods have high utility but low privacy, while RBMs offer a trade-off.…”
Section: Genomic Data Privacy and Securitymentioning
confidence: 99%
“…Outliers are a particularly clear example of this trade-off because of their fundamental difficulty to be statistically captured based on their uniquely identifying features. If the utility is based on learning from outliers, then a useful and private SDG will be challenging, see ( Oprisanu et al., 2022 ) for a demonstration in SDG of genomic data. Therefore, an ideal private synthetic dataset is created by solving the privacy-utility trade-off (see Figure 1 ) optimized to the needs of all the stakeholders.…”
Section: Challengesmentioning
confidence: 99%
“…However, it has become evident that even this limited amount of information can be exploited for privacy attacks, and few queries to genomic beacons can suffice to determine whether individuals (whose genome is known) are present in a study cohort [ 20 - 23 ]. Similarly, proposals for encryption and differential privacy approaches [ 24 , 25 ] have often been countered by demonstrations of attacks [ 26 - 28 ], and even synthetic genetic data may not fully protect the study participants from privacy attacks [ 29 ] (refer to the study by Mittos et al [ 30 ] for a review of privacy-enhancing technologies). Thus, even a substantial reduction in information content can often not completely eliminate all privacy risks of genetic data [ 31 ].…”
Section: Introductionmentioning
confidence: 99%