2006
DOI: 10.1287/isre.1060.0095
|View full text |Cite
|
Sign up to set email alerts
|

Privacy Protection in Data Mining: A Perturbation Approach for Categorical Data

Abstract: T o respond to growing concerns about privacy of personal information, organizations that use their customers' records in data-mining activities are forced to take actions to protect the privacy of the individuals involved. A common practice for many organizations today is to remove identity-related attributes from the customer records before releasing them to data miners or analysts. We investigate the effect of this practice and demonstrate that many records in a data set could be uniquely identified even af… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
18
0

Year Published

2011
2011
2021
2021

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 48 publications
(18 citation statements)
references
References 30 publications
0
18
0
Order By: Relevance
“…So here the data matrix is partitioned vertically and for each partitioned data sub-set matrix the random rotation matrix is used to perturb the original value of the data. Xiao-Bai Li [7] describes in his paper the perturbation approach for preserving the categorical data. The proposed technique limits the disclosure of confidential data and also attempts to preserve the statistical properties of data before releasing them for data mining analysis.…”
Section: B Perturbationmentioning
confidence: 99%
“…So here the data matrix is partitioned vertically and for each partitioned data sub-set matrix the random rotation matrix is used to perturb the original value of the data. Xiao-Bai Li [7] describes in his paper the perturbation approach for preserving the categorical data. The proposed technique limits the disclosure of confidential data and also attempts to preserve the statistical properties of data before releasing them for data mining analysis.…”
Section: B Perturbationmentioning
confidence: 99%
“…The handling of sensitive information is of great concern, such as in the case of medical data mining [10], [19]. Additionally, even if organizations strip the identifying information before the data is processed for mining, as demonstrated in [20], it is possible to re-assemble the records in a way that uniquely identifies the users [21].…”
Section: "Legitimate" Aggregation: Data Miningmentioning
confidence: 99%
“…These efforts attempt to use technology (typically via database algorithms and mathematical models) to protect users' identities, while still meeting the information needs of the organizations requesting the mined data. In terms of broad categories, the discipline of "inference control" seeks "to prevent published/exchanged data from being linked with the individual respondents they originated from" [21] (p. 452).…”
Section: "Legitimate" Aggregation: Data Miningmentioning
confidence: 99%
See 1 more Smart Citation
“…To control inference from aggregate statistics, many mechanisms have been proposes. They include auditing queries, e.g., Chin and Özsoyoglu (1982), Chowdhury et al (1999), query restrictions, e.g., Friedman and Hoffman (1980), Nunez et al (2007), Dobkin et al (1979), perturbation, e.g., Matloff (1986), Muralidhar et al (1999), Lee et al (2010), Muralidhar et al (1995), Li and Sarkar (2006), Sarathy et al (2002), Li and Sarkar (2013), cell suppression, e.g., Castro (2007), Fischetti and Salazar (2001), providing approximate answers, e.g., Kadane et al (2006), Garfinkel et al (2002), anonymous data collection, e.g., Kumar et al (2010), and data shuffling or swapping, e.g., Muralidhar and Sarathy (2006), Li and Sarkar (2011). A good survey of classic inference control techniques on SDBs can be found in Adam and Wortmann (1989).…”
Section: Introductionmentioning
confidence: 99%