In the era of data explosion, privacy preserving has become a necessary task for any data mining task. Therefore, data transformation to ensure privacy preservation is needed. Meanwhile, the transformed data must have quality to be used in the intended data mining task, i.e. the impact on the data quality with regard to the data mining task must be minimized. However, the data transformation problem to preserve the data privacy while minimizing the impact has been proven as an NP-hard. Also, for classification mining, each classification approach may use different approach to deliver knowledge. Therefore, data quality metric for the classification task should be tailored to a specific type of classification. In this paper, we focus on maintaining the data quality in the scenarios which the transformed data will be used to build associative classification models. We propose a data quality metric for such the associative classification. Also, we propose a heuristic approach to preserve the privacy and maintain the data quality. Subsequently, we validate our proposed approaches with experiments.
Collaboration between business partners have become crucial these days. An important issue to be addressed is data privacy. In this paper, we address a problem of data privacy based on a prominent privacy model, (k, e)-Anonymous, when a new dataset is to be released, meanwhile there might be existing datasets released elsewhere. Since some attackers might obtain multiple versions of the datasets and compare them with the newly released dataset. Though, the privacy of all the datasets have been well-preserved individually, such comparison can lead to an privacy breach. We study the characteristics of the effects of multiple dataset releasing theoretically. It has been found that the privacy breach subjected to the increment occurs when there exists overlapping between any partition of the new dataset with any partition of any existing dataset. Based on our proposed studies, a polynomial-time algorithm is proposed. Not only it needs only considering one previous version of the dataset, it also can skip computing the overlapping partitions. Thus, the computational complexity of the proposed algorithm is only O(pn 3 ) where p is the number of partitions and n is the number of tuples, meanwhile the privacy of all released datasets as well as the optimal solution can be always guaranteed. In addition, the experiments results, which can illustrate the efficiency of our algorithm, on the real-world dataset is presented.
Privacy preserving has become an essential process for any data mining task. Therefore, data transformation to ensure privacy preservation is needed. In this paper, we address a problem of privacy preserving on an incremental-data scenario in which the data need to be transformed are not static, but appended all the time. Our work is based on a well-known data privacy model, i.e. k -Anonymity. Meanwhile the data mining task to be applied to the given dataset is associative classification. As the problem of privacy preserving for data mining has proven as an NP-hard, we propose to study the characteristics of a proven heuristic algorithm in the incremental scenarios theoretically. Subsequently, we propose a few observations which lead to the techniques to reduce the computational complexity for the problem setting in which the outputs remains the same. In addition, we propose a simple algorithm, which is at most as efficient as the polynomial-time heuristic algorithm in the worst case, for the problem.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.