BackgroundWith the enormous need for federated eco-system for holding global genomic and clinical data, Global Alliance for Genomic and Health (GA4GH) has created an international website called beacon service which allows a researcher to find out whether a specific dataset can be utilized to his or her research beforehand. This simple webservice is quite useful as it allows queries like whether a certain position of a target chromosome has a specific nucleotide. However, the increased integration of individuals genomic data into clinical practice and research raised serious privacy concern. Though the answer of such queries are yes or no in Bacon network, it results in serious privacy implication as demonstrated in a recent work from Shringarpure and Bustamante. In their attack model, the authors demonstrated that with a limited number of queries, presence of an individual in any dataset can be determined.MethodsWe propose two lightweight algorithms (based on randomized response) which captures the efficacy while preserving the privacy of the participants in a genomic beacon service. We also elaborate the strength and weakness of the attack by explaining some of their statistical and mathematical models using real world genomic database. We extend their experimental simulations for different adversarial assumptions and parameters.ResultsWe experimentally evaluated the solutions on the original attack model with different parameters for better understanding of the privacy and utility tradeoffs provided by these two methods. Also, the statistical analysis further elaborates the different aspects of the prior attack which leads to a better risk management for the participants in a beacon service.ConclusionsThe differentially private and lightweight solutions discussed here will make the attack much difficult to succeed while maintaining the fundamental motivation of beacon database network.Electronic supplementary materialThe online version of this article (doi:10.1186/s12920-017-0278-x) contains supplementary material, which is available to authorized users.
Summary Every day, the amount of generated data from different resources are increasing significantly and posing new serious challenges in terms of data storage and maintaining. As a solution, outsourcing data to a public cloud server with high storage and processing capacity sounds reasonable in comparison with storing in local data storage. Therefore, the database as a service (DaaS) paradigm has gained much popularity over the last decade since the introduction of cloud services. Storing confidential data into a third party arises some concerns regarding security and privacy of the data. One popular approach to address these problems in data outsourcing is utilizing secret sharing and distributing sensitive data among several databases. In this paper, we introduce a flaw that exhibits a deficiency relating to current schemes. This flaw indicates an intrinsic vulnerability to such methods and necessitates a reconsideration of these schemes. Moreover, a new way will be developed to resolve this problem in the previous schemes, which can be employed in all current schemes to mitigate the mentioned concern. Finally, a comprehensive implementation demonstrates that the proposed method is scalable to meet increasing demands in large databases.
Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a time-consuming and expensive process. Second, it requires large-scale computation and storage systems to processes genomic sequences. Third, genomic databases are often owned by different organizations and thus not available for public usage. Cloud computing paradigm can be leveraged to facilitate the creation and sharing of big genomic databases for these applications. Genomic data owners can outsource their databases in a centralized cloud server to ease the access of their databases. However, data owners are reluctant to adopt this model, as it requires outsourcing the data to an untrusted cloud service provider that may cause data breaches. In this paper, we propose a privacy-preserving model for outsourcing genomic data to a cloud. The proposed model enables query processing while providing privacy protection of genomic databases. Privacy of the individuals is guaranteed by permuting and adding fake genomic records in the database. These techniques allow cloud to evaluate count and top-k queries securely and efficiently. Experimental results demonstrate that a count and a top-k query over 40 SNPs in a database of 20,000 records takes around 100 and 150 seconds, respectively.
Federated Learning (FL) is a method for training machine learning algorithms on decentralized data where sharing the raw data is not feasible due to privacy regulations. An instance of such data is Electronic Health Records (EHRs), which contain confidential patient information. In FL, the sensitive data is not shared, rather local models are trained and the model parameters are then aggregated on a central server. However, this method presents privacy challenges, necessitating the implementation of privacy protection strategies, such as data anonymization, before sharing the model parameters. Balancing the trade-off between privacy and utility is a crucial aspect in FL research, as integrating privacy algorithms can have an impact on the utility. The objective of this thesis is to improve the performance of FL while maintaining privacy, through techniques like data generalization, feature selection for dimension reduction, and minimizing noise in the anonymization process. This research also investigates separating data based on features instead of records and evaluates the performance of the proposed model using real healthcare data, with the aim of developing a predictive model for healthcare applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.