The exponential growth of collected, processed, and shared microdata has given rise to concerns about individuals' privacy. As a result, laws and regulations have emerged to control what organisations do with microdata and how they protect it. Statistical Disclosure Control seeks to reduce the risk of confidential information disclosure by de-identifying them. Such de-identification is guaranteed through privacy-preserving techniques. However, de-identified data usually results in loss of information, with a possible impact on data analysis precision and model predictive performance. The main goal is to protect the individuals' privacy while maintaining the interpretability of the data, i.e. its usefulness. Statistical Disclosure Control is an area that is expanding and needs to be explored since there is still no solution that guarantees optimal privacy and utility. This survey focuses on all steps of the de-identification process. We present existing privacy-preserving techniques used in microdata de-identification, privacy measures suitable for several disclosure types and, information loss and predictive performance measures. In this survey, we discuss the main challenges raised by privacy constraints, describe the main approaches to handle these obstacles, review taxonomies of privacy-preserving techniques, provide a theoretical analysis of existing comparative studies, and raise multiple open issues.
scite is a Brooklyn-based startup that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
334 Leonard St
Brooklyn, NY 11211
Copyright © 2023 scite Inc. All rights reserved.
Made with 💙 for researchers