Stealing attack against controlled information, along with the increasing number of information leakage incidents, has become an emerging cyber security threat in recent years. Due to the booming development and deployment of advanced analytics solutions, novel stealing attacks utilize machine learning (ML) algorithms to achieve high success rate and cause a lot of damage. Detecting and defending against such attacks is challenging and urgent so governments, organizations, and individuals should attach great importance to the ML-based stealing attacks. This survey presents the recent advances in this new type of attack and corresponding countermeasures. The ML-based stealing attack is reviewed in perspectives of three categories of targeted controlled information, including controlled user activities, controlled ML model-related information, and controlled authentication information. Recent publications are summarized to generalize an overarching attack methodology and to derive the limitations and future directions of ML-based stealing attacks. Furthermore, countermeasures are proposed towards developing effective protections from three aspects—detection, disruption, and isolation.
With the large volume of network traffic flow, it is necessary to preprocess raw data before classification to gain the accurate results speedily. Feature selection is an essential approach in preprocessing phase. The principal component analysis (PCA) is recognized as an effective and efficient method. In this paper, we classify network traffic flows by using the PCA technique together with 6 machine learning algorithms-Naive Bayes, decision tree, 1-nearest neighbor, random forest, support vector machine, and H 2 O. We analyzed the impact of PCA on the classification results by applying each algorithm with and without PCA onto the data set. Experiments were set out by varying the size of input data sets, and the performances were measured from 2 aspects, including average overall accuracy and F-measure. The computational time was also considered in analyzing the performance. Our results showed that random forest and 1-nearest neighbor were the top 2 algorithms among all the 6 regarding the 2 metrics mentioned above. Then we continued the study of PCA impact on per class level with these 2 algorithms as examples. And the positive correlation between overall impact and the number of class with significant impact was revealed.Lastly, the visualization was used in exploring the reasons of the impacts caused by PCA. Two factors are considered in PCA's impact on per class level: benefit for classes grouped by PCA and mislabeled error interfered by nearby groups.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.