Estimating the success of re-identifications in incomplete datasets using generative models

Rocher, Luc; Hendrickx, Julien M.; Montjoye, Yves-Alexandre de

doi:10.1038/s41467-019-10933-3

Cited by 544 publications

(407 citation statements)

References 40 publications

Supporting

Mentioning

337

Contrasting

Unclassified

Order By: Relevance

“…Some data scientists have demonstrated that re-identification is highly probable in large datasets and suggest further technical solutions. 46 Rather than relying too heavily on de-identification, data protection must rely on a balance of information security and IG safeguards.…”

Section: Discussionmentioning

confidence: 99%

Five models for child and adolescent data linkage in the UK: a review of existing and proposed methods

Mansfield

Gallacher

Mourby

et al. 2020

Evid Based Mental Health

View full text Add to dashboard Cite

Over the last decade dramatic advances have been made in both the technology and data available to better understand the multifactorial influences on child and adolescent health and development. This paper seeks to clarify methods that can be used to link information from health, education, social care and research datasets. Linking these different types of data can facilitate epidemiological research that investigates mental health from the population to the patient; enabling advanced analytics to better identify, conceptualise and address child and adolescent needs. The majority of adolescent mental health research is not able to maximise the full potential of data linkage, primarily due to four key challenges: confidentiality, sampling, matching and scalability. By presenting five existing and proposed models for linking adolescent data in relation to these challenges, this paper aims to facilitate the clinical benefits that will be derived from effective integration of available data in understanding, preventing and treating mental disorders.

show abstract

Section: Discussionmentioning

confidence: 99%

Five models for child and adolescent data linkage in the UK: a review of existing and proposed methods

Mansfield

Gallacher

Mourby

et al. 2020

Evid Based Mental Health

View full text Add to dashboard Cite

show abstract

“…More recently, she demonstrated the ability to correctly identify 25% of research participants by name and 28% by address from data redacted beyond the HIPAA Safe Harbor standard [99]. Other authors have demonstrated the ability to re-identify at least 90% of Americans utilizing credit card metadata or via statistical models [96,100,101]. Given this emerging area of research, the need to systemically identify all stakeholders and potential data "owners" becomes increasingly essential in the identification of potential downstream security risks to users.…”

Section: Themes Data Transmission and Storagementioning

confidence: 99%

Developments in Privacy and Data Ownership in Mobile Health Technologies, 2016-2019

Galvin

DeMuro²

2020

Yearb Med Inform

View full text Add to dashboard Cite

Objectives: To survey international regulatory frameworks that serve to protect privacy of personal data as a human right as well as to review the literature regarding privacy protections and data ownership in mobile health (mHealth) technologies between January 1, 2016 and June 1, 2019 in order to identify common themes. Methods: We performed a review of relevant literature available in English published between January 1, 2016 and June 1, 2019 from databases including PubMed, Google Scholar, and Web of Science, as well as relevant legislative background material. Articles out of scope (as detailed below) were eliminated. We categorized the remaining pool of articles and discrete themes were identified, specifically: concerns around data transmission and storage, including data ownership and the ability to re-identify previously de-identified data; issues with user consent (including the availability of appropriate privacy policies) and access control; and the changing culture and variable global attitudes toward privacy of health data. Results: Recent literature demonstrates that the security of mHealth data storage and transmission remains of wide concern, and aggregated data that were previously considered “de-identified” have now been demonstrated to be re-identifiable. Consumer-informed consent may be lacking with regard to mHealth applications due to the absence of a privacy policy and/or to text that is too complex and lengthy for most users to comprehend. The literature surveyed emphasizes improved access control strategies. This survey also illustrates a wide variety of global user perceptions regarding health data privacy. Conclusion: The international regulatory framework that serves to protect privacy of personal data as a human right is diverse. Given the challenges legislators face to keep up with rapidly advancing technology, we introduce the concept of a “healthcare fiduciary” to serve the best interest of data subjects in the current environment.

show abstract

“…Pseudonymization has its limitations (Gymrek et al , ; cf. Glossary ), and developments in machine learning and artificial intelligence already allow re‐identification of even small samples from anonymized data sets (Rocher et al , ). The likelihood of individual re‐identification from genomic data, whether coded or anonymized, is higher when such data have been linked with familial, sociodemographic, or audio‐visual information, as is often the case in rare diseases research (Thu Nguyen et al , ).…”

Section: Uncertainty Around Data Transfers Within the Eumentioning

confidence: 99%

Genomic data sharing in Europe is stumbling—Could a code of conduct prevent its fall?

Molnár-Gábor

Korbel

2020

EMBO Mol Med

View full text Add to dashboard Cite

Genomic data sharing is becoming more important as scientists join forces across borders in biomedical research for the benefit of patients and society. The EU's General Data Protection Regulation (GDPR) helps simplify sharing of such data at the European and international level. However, initial optimism has dried up as EU member states go their own ways in implementing the GDPR into national laws, and as legal cases challenging data sharing reach courts. Codes of conduct could facilitate data sharing in Europe and better connect it to global health research. This commentary explains the potential of codes of conduct for addressees and drafters. Codes are no panacea though; other measures may be necessary to ensure that Europe remains collaborative and competitive in biomedical research. Nevertheless, codes of conduct would bring immediate benefits and, in the long term, could foster a true European ecosystem for joint biomedical research and easier international data sharing.

show abstract

Estimating the success of re-identifications in incomplete datasets using generative models

Cited by 544 publications

References 40 publications

Five models for child and adolescent data linkage in the UK: a review of existing and proposed methods

Five models for child and adolescent data linkage in the UK: a review of existing and proposed methods

Developments in Privacy and Data Ownership in Mobile Health Technologies, 2016-2019

Genomic data sharing in Europe is stumbling—Could a code of conduct prevent its fall?

Contact Info

Product

Resources

About