Redundant and irrelevant features in data have caused a long-term problem in network traffic classification. These features not only slow down the process of classification but also prevent a classifier from making accurate decisions, especially when coping with big data. In this paper, we propose a mutual information based algorithm that analytically selects the optimal feature for classification. This mutual information based feature selection algorithm can handle linearly and nonlinearly dependent data features. Its effectiveness is evaluated in the cases of network intrusion detection. An Intrusion Detection System (IDS), named Least Square Support Vector Machine based IDS (LSSVM-IDS), is built using the features selected by our proposed feature selection algorithm. The performance of LSSVM-IDS is evaluated using three intrusion detection evaluation datasets, namely KDD Cup 99, NSL-KDD and Kyoto 2006+ dataset. The evaluation results show that our feature selection algorithm contributes more critical features for LSSVM-IDS to achieve better accuracy and lower computational cost compared with the state-of-the-art methods.
Intrusion Detection Systems (IDSs) play a significant role in monitoring and analyzing daily activities occurring in computer systems to detect occurrences of security threats. However, the routinely produced analytical data from computer networks are usually of very huge in size. This creates a major challenge to IDSs, which need to examine all features in the data to identify intrusive patterns. The objective of this study is to analyze and select the more discriminate input features for building computationally efficient and effective schemes for an IDS. For this, a hybrid feature selection algorithm in combination with wrapper and filter selection processes is designed in this paper. Two main phases are involved in this algorithm. The upper phase conducts a preliminary search for an optimal subset of features, in which the mutual information between the input features and the output class serves as a determinant criterion. The selected set of features from the previous phase is further refined in the lower phase in a wrapper manner, in which the Least Square Support Vector Machine (LSSVM) is used to guide the selection process and retain optimized set of features. The efficiency and effectiveness of our approach is demonstrated through building an IDS and a fair comparison with other stateof-the-art detection approaches. The experimental results show that our hybrid model is promising in detection compared to the previously reported results.
Abstract-This paper considers the feature selection problem for data classification in the absence of data labels. Due to the lack of categorized information in many practical applications, unsupervised feature selection has been proven to be more practically important but at the same time more difficult. It is not an easy task to assess the relevance of a feature or a subset of features when there are no labels available with the data. In this paper, we first propose an unsupervised feature selection algorithm, which is an enhancement over Laplacian score method. We name our algorithm a Modified Laplacian score, M L in short. Specifically, two main phases are involved in M L to complete the selection procedures. In the first phase, the Laplacian score algorithm is applied to select the features that have the best locality preserving power. In the second phase, M L introduces a new Redundancy Penalization (RP) technique based on Mutual Information (MI) to eliminate the redundancy among the selected features. We evaluate our work through applying the proposed unsupervised feature selection algorithm to build an Intrusion Detection System. The effectiveness and the feasibility of the proposed detection system are evaluated using three wellknown intrusion detection datasets: KDD Cup 99, NSL-KDD and Kyoto 2006+ dataset. The evaluation results confirm that our feature selection approach performs better than the Laplacian score method in terms of classification accuracy.
Abstract-Cyber crimes and malicious network activities have posed serious threats to the entire internet and its users. This issue is becoming more critical, as network-based services, are more widespread and closely related to our daily life. Thus, it has raised a serious concern in individual internet users, industry and research community. A significant amount of work has been conducted to develop intelligent anomaly-based Intrusion Detection Systems (IDSs) to address this issue. However, one technical challenge, namely reducing false alarm, has been along with the development of anomaly-based IDSs since 1990s. In this paper, we provide a solution to this challenge. A Nonlinear Correlation Coefficient (NCC) based similarity measure is proposed to help extract both linear and nonlinear correlations between network traffic records. This extracted correlative information is used in our proposed IDS to detect malicious network behaviours. The effectiveness of the proposed NCC-based measure and the proposed IDS are evaluated using NSL-KDD data set. The evaluation results demonstrate that the proposed NCC-based measure not only helps reduce false alarm rate, but also helps discriminate normal and abnormal behaviours efficiently.
With the rapid growth of data communications in size and complexity, the threat of malicious activities and computer crimes has increased accordingly as well. Thus, investigating efficient data processing techniques for network operation and management over large-scale network traffic is highly required. Some mathematical approaches on flow-level traffic data have been proposed due to the importance of analyzing the structure and situation of the network. Different from the state-of-the-art studies, we first propose a new decomposition model based on accelerated proximal gradient method for packet-level traffic data. In addition, we present the iterative scheme of the algorithm for network anomaly detection problem, which is termed as NAD-APG. Based on the approach, we carry out the intrusion detection for packet-level network traffic data no matter whether it is polluted by noise or not. Finally, we design a prototype system for network anomalies detection such as Probe and R2L attacks. The experiments have shown that our approach is effective in revealing the patterns of network traffic data and detecting attacks from large-scale network traffic. Moreover, the experiments have demonstrated the robustness of the algorithm as well even when the network traffic is polluted by the large volume anomalies and noise.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.