The privacy of individuals included in the datasets must be preserved when sensitive datasets are published. Anonymization algorithms such as k-anonymization have been proposed in order to reduce the risk of individuals in the dataset being identified. k-anonymization is the most common technique of modifying attribute values in a dataset until at least k identical records are generated. There are many algorithms that can be used to achieve k-anonymity. However, existing algorithms have the problem of information loss due to a tradeoff between data quality and anonymity. In this paper, we propose a novel method of constructing a generalization hierarchy for k anonymization algorithms. Our method analyses the correlation between attributes and generates an optimal hierarchy according to the correlation. The effect of the proposed scheme has been verified using the actual data: the average of k of the datasets is 83.14, and it is around 1/3 of the value obtained by conventional methods.
A huge number of documents such as news articles, public reports, and personal essays have been released on websites and social media. Once documents containing privacy-sensitive information are published, the risk of privacy breaches increases, thus requiring very careful review of documents prior to publication. In many cases, human experts redact or sanitize documents before publishing them; however, this approach can be inefficient with regard to cost and accuracy. Furthermore, such measures do not guarantee that critical privacy risks are eliminated from the documents. In this paper, we present a generalized adversary model and apply it to document data. This work devises an attack algorithm for documents using a web search engine, and then proposes a privacy-preserving framework against the attacks. We evaluate the privacy risks for actual accident reports from schools and court documents. In experiments using these reports, we show that human-sanitized documents still contain privacy risks and that our proposed approach can contribute to risk reduction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.