Anonymizing transaction databases for publication

Xu, Yangsheng; Wang, Ke; Fu, Ada Wai-Chee; Yu, Philip S.

doi:10.1145/1401890.1401982

Cited by 136 publications

(206 citation statements)

References 14 publications

Supporting

Mentioning

202

Contrasting

Unclassified

Order By: Relevance

“…Yabo Xu et al [6] model the power of attackers by the maximum size of public itemsets that may be acquired as prior knowledge, and proposed a novel privacy notion called "coherence" suitable for transactional databases.…”

Section: Related Workmentioning

confidence: 99%

Personalized Privacy Preserving Publication of Transactional Datasets Using Concept Learning

Reddy¹,

Raju²,

Kumari³

2012

IJMLC

View full text Add to dashboard Cite

Abstract-In this paper we study the problem of protecting privacy in the publication of transactional data. Consider a collection of transactional data that contains detailed information about items bought together by individuals. Even after removing all personal characteristics of the buyer, which can serve as links to his identity, the publication of such data is still subject to privacy attacks from adversaries who have partial knowledge about the set. Unlike previous works, we do not distinguish data as sensitive and non-sensitive, but we consider them both as potential quasi-identifiers and potential sensitive data, depending on the point of view of the adversary. We define a new version of the anonymity guarantee using concept learning. Our anonymization model relies on generalization using concept hierarchy and concept learning. The proposed algorithms are experimentally evaluated using real world datasets.

show abstract

Section: Related Workmentioning

confidence: 99%

Personalized Privacy Preserving Publication of Transactional Datasets Using Concept Learning

Reddy¹,

Raju²,

Kumari³

2012

IJMLC

View full text Add to dashboard Cite

show abstract

“…The structure of large survey rating data is different from relational data, since it does not have fixed personal identifiable attributes. The lack of a clear set of personal identifiable attributes makes the anonymisation challenging [23], [7]. In addition, survey rating data contains many attributes, each of which corresponds to the response to a survey question, but not all participants need to rate all issues (or answer all questions), which means a lot of cells in a data set are empty.…”

Section: A Motivationmentioning

confidence: 99%

“…Privacy-preservation of transactional data has been acknowledged as an important problem in the data mining literature [3], [4], [21], [7], [23]. The privacy threats caused by publishing data mining results such as frequent item sets and association rules is addressed in [3], [4].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Towards Identify Anonymization in Large Survey Rating Data

Sun

Wang

2010

2010 Fourth International Conference on Network and System Security

View full text Add to dashboard Cite

Abstract-We study the challenge of identity protection in the large public survey rating data. Even though the survey participants do not reveal any of their ratings, their survey records are potentially identifiable by using information from other public sources. None of the existing anonymisation principles (e.g., k-anonymity, l-diversity, etc.) can effectively prevent such breaches in large survey rating data sets. In this paper, we tackle the problem by defining the (k, )-anonymity principle. The principle requires for each transaction t in the given survey rating data T , at least (k − 1) other transactions in T must have ratings similar with t, where the similarity is controlled by . We propose a greedy approach to anonymize survey rating data and apply the method to two real-life data sets to demonstrate their efficiency and practical utility.

show abstract

“…We choose to solve the set-valued data anonymization problem by partial suppression because global suppression tends to delete more items than necessary, and the removal of all occurrences of the same item not only changes the data distribution significantly but also makes mining association rules about the deleted items impossible. The problem of anonymization by suppression (global or partial) is very challenging [1,18], exactly because, (i) the number of possible inferences from a given dataset is exponential, and (ii) the size of the search space, i.e. the number of ways to suppress the data is also exponential to the number of data items.…”

Section: Introductionmentioning

confidence: 99%

ρ-uncertainty Anonymization by Partial Suppression

Jia

Pan

et al. 2014

Database Systems for Advanced Applications

View full text Add to dashboard Cite

Abstract. We present a novel framework for set-valued data anonymization by partial suppression regardless of the amount of background knowledge the attacker possesses, and can be adapted to both space-time and quality-time trade-offs in a "pay-as-you-go" approach. While minimizing the number of item deletions, the framework attempts to either preserve the original data distribution or retain mineable useful association rules, which targets statistical analysis and association mining, two major data mining applications on set-valued data.

show abstract

Anonymizing transaction databases for publication

Cited by 136 publications

References 14 publications

Personalized Privacy Preserving Publication of Transactional Datasets Using Concept Learning

Personalized Privacy Preserving Publication of Transactional Datasets Using Concept Learning

Towards Identify Anonymization in Large Survey Rating Data

ρ-uncertainty Anonymization by Partial Suppression

Contact Info

Product

Resources

About