Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2008
DOI: 10.1145/1401890.1401982
|View full text |Cite
|
Sign up to set email alerts
|

Anonymizing transaction databases for publication

Abstract: This paper considers the problem of publishing "transaction data" for research purposes. Each transaction is an arbitrary set of items chosen from a large universe. Detailed transaction data provides an electronic image of one's life. This has two implications. One, transaction data are excellent candidates for data mining research. Two, use of transaction data would raise serious concerns over individual privacy. Therefore, before transaction data is released for data mining, it must be made anonymous so that… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
202
0
2

Year Published

2010
2010
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 136 publications
(206 citation statements)
references
References 14 publications
2
202
0
2
Order By: Relevance
“…Yabo Xu et al [6] model the power of attackers by the maximum size of public itemsets that may be acquired as prior knowledge, and proposed a novel privacy notion called "coherence" suitable for transactional databases.…”
Section: Related Workmentioning
confidence: 99%
“…Yabo Xu et al [6] model the power of attackers by the maximum size of public itemsets that may be acquired as prior knowledge, and proposed a novel privacy notion called "coherence" suitable for transactional databases.…”
Section: Related Workmentioning
confidence: 99%
“…The structure of large survey rating data is different from relational data, since it does not have fixed personal identifiable attributes. The lack of a clear set of personal identifiable attributes makes the anonymisation challenging [23], [7]. In addition, survey rating data contains many attributes, each of which corresponds to the response to a survey question, but not all participants need to rate all issues (or answer all questions), which means a lot of cells in a data set are empty.…”
Section: A Motivationmentioning
confidence: 99%
“…Privacy-preservation of transactional data has been acknowledged as an important problem in the data mining literature [3], [4], [21], [7], [23]. The privacy threats caused by publishing data mining results such as frequent item sets and association rules is addressed in [3], [4].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…We choose to solve the set-valued data anonymization problem by partial suppression because global suppression tends to delete more items than necessary, and the removal of all occurrences of the same item not only changes the data distribution significantly but also makes mining association rules about the deleted items impossible. The problem of anonymization by suppression (global or partial) is very challenging [1,18], exactly because, (i) the number of possible inferences from a given dataset is exponential, and (ii) the size of the search space, i.e. the number of ways to suppress the data is also exponential to the number of data items.…”
Section: Introductionmentioning
confidence: 99%