An Attack on the Privacy of Sanitized Data that Fuses the Outputs of Multiple Data Miners

Sramka, Michal; Denzinger, Jörg

doi:10.1109/icdmw.2009.28

Cited by 8 publications

(14 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We implemented all data reconstruction methods on the 4500 datasets mentioned above. In each case, we obtained an estimation of the original data set which was then compared with the original data set in terms of the success measures described in the papers [8], [9] and [10]. In each case, although we tested (the same) 4500 matrices in each attack, we present the details of only one of our matrices -one which has 6400 entries.…”

Section: Description and Results Of The Reconstruction Attacksmentioning

confidence: 99%

“…We use the same assumptions on data mentioned in [8] (SPF), [9] (BE-DR) and [10] (MDMF) as appropriate, and use the notation of Table 1. We generate 4500 matrices using the algorithm described in Section 2; 1500 of these were of size 400, 1500 of size 1600 and 1500 of size 6400.…”

Section: Description and Results Of The Reconstruction Attacksmentioning

confidence: 99%

“…We choose: Spectral Filtering (SPF) [8], Bayes-Estimated Data Reconstruction (BE-DR) [9] and Multiple Miner attack with Fusion (MDMF) [10]. We use the SPF method because it has a good track record in reconstructing original data based on additive perturbation; it is based on eigenvalues of a covariance matrix and the theory of random matrices [8].…”

Section: The Research Literaturementioning

confidence: 99%

“…ii. We evaluate the success of attack by calculating , , for all 1≤ w, j ≤ √m ( [10]) and find that 5337 elements out of 6400 elements of the matrices have satisfied the inequalities. In this case, the attacker has obtained 83.39% of the original data and failed to recover 16.61 % of it.…”

Section: Test Example Imentioning

confidence: 99%

“…We choose BE-DR for its ease of calculation and also because of its similarity to the calculations of SPF. The MDMF method is a combination of multiple data mining [10] and fusion techniques [10]; we use WEKA software [11] for data mining techniques in this method.…”

Section: The Research Literaturementioning

confidence: 99%

See 4 more Smart Citations

An Attack-Resistant Hybrid Data-Privatization Method with Low Information Loss

Singh

Batten

2013

Trust Management VII

View full text Add to dashboard Cite

Abstract. We examine a recent proposal for data-privatization by testing it against well-known attacks; we show that all of these attacks successfully retrieve a relatively large (and unacceptable) portion of the original data. We then indicate how the data-privatization method examined can be modified to assist it to withstand these attacks and compare the performance of the two approaches. We also show that the new method has better privacy and lower information loss than the former method.Keywords: data-privatization, information loss, Chebyshev polynomial, Spectral Filtering, Bayes-Estimated Data Reconstruction, data mining. 1Introduction and Background Data-PrivatizationPrivacy preservation is an important issue in many data mining applications dealing with sensitive data such as health-care records. Privacy preserving data mining (PPDM) has become an important enabling technology for integrating data and determining interesting patterns from private collections of databases, thus improving productivity and competitiveness for many businesses. PPDM requires data modification which limits information loss (thus increasing utility) as it is intended that a legitimate receiver of the modified data be able to recover the original data needed for a response. Perturbation techniques have to manage the intrinsic trade-off between preserving data privacy and information loss, as each affects the other. Several perturbation techniques [1]- [5] have been proposed for mining purposes, but in all these papers, privacy and utility are not satisfactorily balanced. In the research literature, there are two general approaches to privacy preserving data mining: the randomization approach [1] and the secure multi-party computation approach [6]. We focus only on the former because it can distort data more efficiently than the latter. There are two major randomization methods: Random Perturbation [2] and Randomized Response [5]. The former is a technique which deals mostly with numerical data, perturbing attribute by attribute, and concentrating on a statistical analysis of the data; it is a well-studied sanitization method that simultaneously allows access to the data by publishing them and at the same time preserving the privacy of the data. Randomized Response perturbs multiple attributes rather than one at a time, and so we ignore this method.

show abstract

Section: Description and Results Of The Reconstruction Attacksmentioning

confidence: 99%