The sharing of genomic data holds great promise for advancing precision medicine, providing personalized treatments and other types of interventions. However, there are privacy concerns, as data misuse may lead to infringement of privacy for individuals and their blood relatives. As genomic data are rapidly growing and some of these data are being made available to researchers, it is imperative to understand the current genome privacy landscape and to identify the challenges in developing effective privacy-protecting solutions. In this work, we provide an overview of major privacy threats identified by the research community and examine the privacy challenges in the context of emerging direct-to-consumer applications. We present general privacy protection techniques for genomic data sharing and their potential applications in direct-to-consumer genomic testing and forensic analyses. We discuss limitations in current privacy protection methods, highlight possible mitigation strategies, and suggest future research opportunities for advancing genomic data sharing.
Frequent sequential pattern mining is a central task in many fields such as biology and finance. However, release of these patterns is raising increasing concerns on individual privacy. In this paper, we study the sequential pattern mining problem under the differential privacy framework which provides formal and provable guarantees of privacy. Due to the nature of the differential privacy mechanism which perturbs the frequency results with noise, and the high dimensionality of the pattern space, this mining problem is particularly challenging. In this work, we propose a novel two-phase algorithm for mining both prefixes and substring patterns. In the first phase, our approach takes advantage of the statistical properties of the data to construct a model-based prefix tree which is used to mine prefixes and a candidate set of substring patterns. The frequency of the substring patterns is further refined in the successive phase where we employ a novel transformation of the original data to reduce the perturbation noise. Extensive experiment results using real datasets showed that our approach is effective for mining both substring and prefix patterns in comparison to the state-of-theart solutions.
Accessing and integrating human genomic data with phenotypes is important for biomedical research. Making genomic data accessible for research purposes, however, must be handled carefully to avoid leakage of sensitive individual information to unauthorized parties and improper use of data. In this article, we focus on data sharing within the scope of data accessibility for research. Current common practices to gain biomedical data access are strictly rule based, without a clear and quantitative measurement of the risk of privacy breaches. In addition, several types of studies require privacy-preserving linkage of genotype and phenotype information across different locations (e.g., genotypes stored in a sequencing facility and phenotypes stored in an electronic health record) to accelerate discoveries. The computer science community has developed a spectrum of techniques for data privacy and confidentiality protection, many of which have yet to be tested on real-world problems. In this article, we discuss clinical, technical, and ethical aspects of genome data privacy and confidentiality in the United States, as well as potential solutions for privacy-preserving genotype–phenotype linkage in biomedical research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.