Releasing raw data sets with sensitive personal information will leak privacy. Therefore, various differential privacy methods have been proposed for efficient data sharing while preserving privacy. However, they focus on noise processing of all quasi-identifier attributes, which results in high time-space complexity and low data utility. In this paper, we propose a Differential Privacy Protection model considering the Correlations between Attributes, denoted DPPCA. DPPCA first computes the degree of correlations between the quasi-identifier attributes and the sensitive attributes, and determines the pair of attributes with maximal degree of correlation. Then based the attributes with the maximal degree of correlations, it uses microaggregation to partition the data set into clusters of size k (k ≥ 2) according to three types of attributes, i.e., numerical, non-numerical, and hybrid attributes, such that there are l (l < k) values of sensitive attributes in a cluster. Finally, noise is added to each cluster separately such that each cluster satisfies ε-differential privacy. While keeping the same degree of preserving privacy, our experimental results demonstrate that DPPCA substantially reduces the amount of added noise to 11% for the Census data set and the Adult data set. Therefore, DPPCA greatly improve the data utility while reaching the same degree of differential privacy. INDEX TERMS ε-Differential privacy, mutual information, microaggregation, privacy protection data publishing, data utility.
In the field of data mining, protecting sensitive data from being leaked is part of the focuses of current research. As a strict and provable definition of privacy model, differential privacy provides an excellent solution to the problem of privacy leakage. Numerous methods have been suggested to enforce differential privacy in various data mining tasks, such as regression analysis. However, existing solutions for regression analysis is less than satisfactory since the amount of noise added is excessive. What's worse, the adversary can launch model inversion attacks to infer sensitive information with the published regression model. Motivated by this, we propose a differential privacy budget allocation model. We optimize the regression model by adjusting the privacy budget allocation within the objective function. Extensive evaluation results show the superiority of the proposed model in terms of noise reduction, model inversion attack proof, and the trade-off between privacy protection and data utility. INDEX TERMS Machine learning, differential privacy, regression analysis, model inversion attack.
Sensor network intrusion detection has attracted extensive attention. However, previous intrusion detection methods face the highly imbalanced attack class distribution problem, and they may not achieve a satisfactory performance. To solve this problem, we propose a new intrusion detection algorithm based on normalized cut spectral clustering for sensor network in this paper. The main aim is to reduce the imbalance degree among classes in an intrusion detection system. First, we design a normalized cut spectral clustering to reduce the imbalance degree between every two classes in the intrusion detection data set. Second, we train a network intrusion detection classifier on the new data set. Finally, we do extensive experiments and analyze the experimental results in detail. Simulation experiments show that our algorithm can reduce the imbalance degree among classes and reserves the distribution of the original data on the one hand, and improve effectively the detection performance on the other hand.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.