Data mining provides the opportunity to extract useful information from large databases. Various techniques have been proposed in this context in order to extract this information in the most efficient way. However efficiency is not our only concern in this study. The security and privacy issues over the extracted knowledge must be seriously considered as well. By taking this into consideration, we study the discovery of association rules in binary data sets and we propose algorithms for selectively hiding sensitive association rules. Association rule hiding is a well researched area in privacy preserving data mining and many algorithms have been proposed to address it. The algorithms that we introduce use a distortion-based technique for hiding the sensitive rules. The hiding process may introduce a number of side effects either by generating rules which were not previously existing (ghost rules) or by eliminating existing non-sensitive rules (lost rules). The proposed algorithms use effective data structures for the representation of the association rules and they strongly rely on the prioritization of the selection of the transactions to choose for falsification (victim transactions) by using weights. In this paper we show that our algorithms perform better than other similar algorithms in this field in eliminating non-sensitive rules without increasing the processing time significantly.
Data mining provides the opportunity to extract useful information from large databases. Various techniques have been proposed in this context in order to extract this information in the most efficient way. However, efficiency is not our only concern in this study. The security and privacy issues over the extracted knowledge must be seriously considered as well. By taking this into consideration, we study the procedure of hiding sensitive association rules in binary data sets by blocking some data values and we present an algorithm for solving this problem. We also provide a fuzzification of the support and the confidence of an association rule in order to accommodate for the existence of blocked/unknown values. In addition, we quantitatively compare the proposed algorithm with other already published algorithms by running experiments on binary data sets, and we also qualitatively compare the efficiency of the proposed algorithm in hiding association rules. We utilize the notion of border rules, by putting weights in each rule, and we use effective data structures for the representation of the rules so as (a) to minimize the side effects created by the hiding process and (b) to speed up the selection of the victim transactions. Finally, we study the overall security of the modified database, using the C4.5 decision tree algorithm of the WEKA data mining tool, and we discuss the advantages and the limitations of blocking.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.