The data relating to data mining has turned out to be profoundly multi-dimensional in the recent past. It is also to be noted that such dimensionality has rapidly expanded over time. Moreover, in light of the positive assessment norms for enhanced data mining concerts, feature selection opts for a petite subset of the significant features from the original dataset. The stability of the feature selection is a key criterion in feature selection algorithms. Moreover, the most important aspect is its sturdiness in reducing the disturbances in the training data or in the expansion of the most recent samples. Lately, it has been demonstrated that the stability of the feature selection usually centers on data, and that it is not entirely unbiased in terms of algorithm. The privacypreserving data mining changes a portion of the sensitive and quasi-identifying attributes in order to keep the conceivable re-identification of an individual's tuple through intrusive or malignant data miner and brings a choppy privacy conserved dataset. Since the stability of the feature selection relies primarily upon data, the stability of the feature selection lessens with such privacy-preserved choppy datasets. Besides, the privacy preserving ruffling associates stresses the stability of the selection of features and data utility. Picking proper privacy-preserving data mining technique with significant privacy-preserving ruffling to enhance feature selection stability alongside the greater privacypreservation and data utility is consequently a challenging issue in the field of research. Hence, the present paper intends to highlight the issue with reference to the three algorithms for privacy-preserving data mining and their relative analysis.
Data mining is indispensable for business organizations for extracting useful information from the huge volume of stored data which can be used in managerial decision making to survive in the competition. Due to the day-to-day advancements in information and communication technology, these data collected from ecommerce and e-governance are mostly high dimensional. Data mining prefers small datasets than high dimensional datasets. Feature selection is an important dimensionality reduction technique. The subsets selected in subsequent iterations by feature selection should be same or similar even in case of small perturbations of the dataset and is called as selection stability. It is recently becomes important topic of research community. The selection stability has been measured by various measures. This paper analyses the selection of the suitable search method and stability measure for the feature selection algorithms and also the influence of the characteristics of the dataset as the choice of the best approach is highly problem dependent.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.