BackgroundPrivacy legislation in most jurisdictions allows the disclosure of health data for secondary purposes without patient consent if it is de-identified. Some recent articles in the medical, legal, and computer science literature have argued that de-identification methods do not provide sufficient protection because they are easy to reverse. Should this be the case, it would have significant and important implications on how health information is disclosed, including: (a) potentially limiting its availability for secondary purposes such as research, and (b) resulting in more identifiable health information being disclosed. Our objectives in this systematic review were to: (a) characterize known re-identification attacks on health data and contrast that to re-identification attacks on other kinds of data, (b) compute the overall proportion of records that have been correctly re-identified in these attacks, and (c) assess whether these demonstrate weaknesses in current de-identification methods.Methods and FindingsSearches were conducted in IEEE Xplore, ACM Digital Library, and PubMed. After screening, fourteen eligible articles representing distinct attacks were identified. On average, approximately a quarter of the records were re-identified across all studies (0.26 with 95% CI 0.046–0.478) and 0.34 for attacks on health data (95% CI 0–0.744). There was considerable uncertainty around the proportions as evidenced by the wide confidence intervals, and the mean proportion of records re-identified was sensitive to unpublished studies. Two of fourteen attacks were performed with data that was de-identified using existing standards. Only one of these attacks was on health data, which resulted in a success rate of 0.00013.ConclusionsThe current evidence shows a high re-identification rate but is dominated by small-scale studies on data that was not de-identified according to existing standards. This evidence is insufficient to draw conclusions about the efficacy of de-identification methods.
For the de-identification of health datasets, OLA is an improvement on existing k-anonymity algorithms in terms of information loss and performance.
Background Electronic data capture (EDC) tools provide automated support for data collection, reporting, query resolution, randomization, and validation, among other features, for clinical trials. There is a trend toward greater adoption of EDC tools in clinical trials, but there is also uncertainty about how many trials are actually using this technology in practice. A systematic review of EDC adoption surveys conducted up to 2007 concluded that only 20% of trials are using EDC systems, but previous surveys had weaknesses.ObjectivesOur primary objective was to estimate the proportion of phase II/III/IV Canadian clinical trials that used an EDC system in 2006 and 2007. The secondary objectives were to investigate the factors that can have an impact on adoption and to develop a scale to assess the extent of sophistication of EDC systems.MethodsWe conducted a Web survey to estimate the proportion of trials that were using an EDC system. The survey was sent to the Canadian site coordinators for 331 trials. We also developed and validated a scale using Guttman scaling to assess the extent of sophistication of EDC systems. Trials using EDC were compared by the level of sophistication of their systems.ResultsWe had a 78.2% response rate (259/331) for the survey. It is estimated that 41% (95% CI 37.5%-44%) of clinical trials were using an EDC system. Trials funded by academic institutions, government, and foundations were less likely to use an EDC system compared to those sponsored by industry. Also, larger trials tended to be more likely to adopt EDC. The EDC sophistication scale had six levels and a coefficient of reproducibility of 0.901 (P< .001) and a coefficient of scalability of 0.79. There was no difference in sophistication based on the funding source, but pediatric trials were likely to use a more sophisticated EDC system.Conclusion The adoption of EDC systems in clinical trials in Canada is higher than the literature indicated: a large proportion of clinical trials in Canada use some form of automated data capture system. To inform future adoption, research should gather stronger evidence on the costs and benefits of using different EDC systems.
BackgroundPrivacy concerns by providers have been a barrier to disclosing patient information for public health purposes. This is the case even for mandated notifiable disease reporting. In the context of a pandemic it has been argued that the public good should supersede an individual's right to privacy. The precise nature of these provider privacy concerns, and whether they are diluted in the context of a pandemic are not known. Our objective was to understand the privacy barriers which could potentially influence family physicians' reporting of patient-level surveillance data to public health agencies during the Fall 2009 pandemic H1N1 influenza outbreak.MethodsThirty seven family doctors participated in a series of five focus groups between October 29-31 2009. They also completed a survey about the data they were willing to disclose to public health units. Descriptive statistics were used to summarize the amount of patient detail the participants were willing to disclose, factors that would facilitate data disclosure, and the consensus on those factors. The analysis of the qualitative data was based on grounded theory.ResultsThe family doctors were reluctant to disclose patient data to public health units. This was due to concerns about the extent to which public health agencies are dependable to protect health information (trusting beliefs), and the possibility of loss due to disclosing health information (risk beliefs). We identified six specific actions that public health units can take which would affect these beliefs, and potentially increase the willingness to disclose patient information for public health purposes.ConclusionsThe uncertainty surrounding a pandemic of a new strain of influenza has not changed the privacy concerns of physicians about disclosing patient data. It is important to address these concerns to ensure reliable reporting during future outbreaks.
BackgroundThe public is less willing to allow their personal health information to be disclosed for research purposes if they do not trust researchers and how researchers manage their data. However, the public is more comfortable with their data being used for research if the risk of re-identification is low. There are few studies on the risk of re-identification of Canadians from their basic demographics, and no studies on their risk from their longitudinal data. Our objective was to estimate the risk of re-identification from the basic cross-sectional and longitudinal demographics of Canadians.MethodsUniqueness is a common measure of re-identification risk. Demographic data on a 25% random sample of the population of Montreal were analyzed to estimate population uniqueness on postal code, date of birth, and gender as well as their generalizations, for periods ranging from 1 year to 11 years.ResultsAlmost 98% of the population was unique on full postal code, date of birth and gender: these three variables are effectively a unique identifier for Montrealers. Uniqueness increased for longitudinal data. Considerable generalization was required to reach acceptably low uniqueness levels, especially for longitudinal data. Detailed guidelines and disclosure policies on how to ensure that the re-identification risk is low are provided.ConclusionsA large percentage of Montreal residents are unique on basic demographics. For non-longitudinal data sets, the three character postal code, gender, and month/year of birth represent sufficiently low re-identification risk. Data custodians need to generalize their demographic information further for longitudinal data sets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.