Noman Mohammed scite author profile

Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among the existing privacy models, -differential privacy provides one of the strongest privacy guarantees and has no assumptions about an adversary's background knowledge. Most of the existing solutions that ensuredifferential privacy are based on an interactive model, where the data miner is only allowed to pose aggregate queries to the database. In this paper, we propose the first anonymization algorithm for the non-interactive setting based on the generalization technique. The proposed solution first probabilistically generalizes the raw data and then adds noise to guarantee -differential privacy. As a sample application, we show that the anonymized data can be used effectively to build a decision tree induction classifier. Experimental results demonstrate that the proposed non-interactive anonymization algorithm is scalable and performs better than the existing solutions for classification analysis.

show abstract

Malware Classification with Deep Convolutional Neural Networks

Kalash

et al. 2018

View full text Add to dashboard Cite

Privacy-preserving trajectory data publishing by local suppression

Chen

Fung

Mohammed

et al. 2013

Information Sciences

214

120

View full text Add to dashboard Cite

The pervasiveness of location-aware devices has spawned extensive research in trajectory data mining, resulting in many important real-life applications.Yet, the privacy issue in sharing trajectory data among different parties often creates an obstacle for effective data mining. In this paper, we study the challenges of anonymizing trajectory data: high dimensionality, sparseness, and sequentiality. Employing traditional privacy models and anonymization

show abstract

Publishing set-valued data via differential privacy

et al. 2011

View full text Add to dashboard Cite

Set-valued data provides enormous opportunities for various data mining tasks. In this paper, we study the problem of publishing set-valued data for data mining tasks under the rigorous differential privacy model. All existing data publishing methods for set-valued data are based on partition-based privacy models, for example k -anonymity, which are vulnerable to privacy attacks based on background knowledge. In contrast, differential privacy provides strong privacy guarantees independent of an adversary's background knowledge and computational power. Existing data publishing approaches for differential privacy, however, are not adequate in terms of both utility and scalability in the context of set-valued data due to its high dimensionality. We demonstrate that set-valued data could be efficiently released under differential privacy with guaranteed utility with the help of context-free taxonomy trees. We propose a probabilistic top-down partitioning algorithm to generate a differentially private release, which scales linearly with the input data size. We also discuss the applicability of our idea to the context of relational data. We prove that our result is (∈, δ)-useful for the class of counting queries, the foundation of many data mining tasks. We show that our approach maintains high utility for counting queries and frequent itemset mining and scales to large datasets through extensive experiments on real-life set-valued datasets.

show abstract

Centralized and Distributed Anonymization for High-Dimensional Healthcare Data

Mohammed¹,

Fung²,

Hung

et al. 2010

ACM Trans. Knowl. Discov. Data

101

View full text Add to dashboard Cite

Sharing healthcare data has become a vital requirement in healthcare system management; however, inappropriate sharing and usage of healthcare data could threaten patients’ privacy. In this article, we study the privacy concerns of sharing patient information between the Hong Kong Red Cross Blood Transfusion Service (BTS) and the public hospitals. We generalize their information and privacy requirements to the problems of centralized anonymization and distributed anonymization , and identify the major challenges that make traditional data anonymization methods not applicable. Furthermore, we propose a new privacy model called LKC-privacy to overcome the challenges and present two anonymization algorithms to achieve LKC-privacy in both the centralized and the distributed scenarios. Experiments on real-life data demonstrate that our anonymization algorithms can effectively retain the essential information in anonymous data for data analysis and is scalable for anonymizing large datasets.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Noman Mohammed

Differentially private data release for data mining

Malware Classification with Deep Convolutional Neural Networks

Privacy-preserving trajectory data publishing by local suppression

Publishing set-valued data via differential privacy

Centralized and Distributed Anonymization for High-Dimensional Healthcare Data

Contact Info

Product

Resources

About