Anonymisation of personal data has a long history stemming from the expansion of the types of data products routinely provided by National Statistical Institutes. Variants on anonymisation have received serious criticism reinforced by much-publicised apparent failures. We argue that both the operators of such schemes and their critics have become confused by being overly focused on the properties of the data themselves. We claim that, far from being able to determine whether data are anonymous (and therefore non-personal) by looking at the data alone, any anonymisation technique worthy of the name must take account of not only the data but also their environment. This paper proposes an alternative formulation called functional anonymisation that focuses on the relationship between the data and the environment within which the data exist (their data environment). We provide a formulation for describing the relationship between the data and their environment that links the legal notion of personal data with the statistical notion of disclosure control. Anonymisation, properly conceived and effectively conducted, can be a critical part of the toolkit of the privacy-respecting data controller and the wider remit of providing accurate and usable data.
Health and medical data are increasingly being generated, collected, and stored in electronic form in healthcare facilities and administrative agencies. Such data hold a wealth of information vital to effective health policy development and evaluation, as well as to enhanced clinical care through evidence-based practice and safety and quality monitoring. These initiatives are aimed at improving individuals' health and well-being. Nevertheless, analyses of health data archives must be conducted in such a way that individuals' privacy is not compromised. One important aspect of protecting individuals' privacy is protecting the confidentiality of their data. It is the purpose of this paper to provide a review of a number of approaches to reducing disclosure risk when making data available for research, and to present a taxonomy for such approaches. Some of these methods are widely used, whereas others are still in development. It is important to have a range of methods available because there is also a range of data-use scenarios, and it is important to be able to choose between methods suited to differing scenarios. In practice, it is necessary to find a balance between allowing the use of health and medical data for research and protecting confidentiality. This balance is often presented as a trade-off between disclosure risk and data utility, because methods that reduce disclosure risk, in general, also reduce data utility.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.