The influence of data resampling on ensemble methods, and repeated cross-validation (RCV)-based ensemble feature selection (FS) is proposed. To evaluate the proposed method, support vector machine and its extension and recursive feature elimination were used as the underlying classification and FS techniques, respectively. Experimental evaluation was performed using four microarray datasets. The results show that especially for extremely small signature sizes, increasing ensemble size increases both classification performance and the robustness of gene selection (stability) for both RCV and bootstrap (BS). However, for ensembles of the same size, RCV outperforms BS in terms of performance and especially stability. When compared to the top results obtained by two other studies in which BS is utilised, RCV performs similar or better in terms of area under the receiver operator curve and better in terms of stability.
SummaryIn the era of technology, information security has gained significant importance, as intruders constantly conduct attacks to breach information systems. Intelligent network intrusion detection systems (NIDS) are promising for detecting malicious activities; however, it is required to apply feature selection (FS) and classifier optimisation (CO) using cost‐effective algorithms to build an accurate and efficient system. Although classifier‐dependent FS (CDFS) techniques and CO algorithms have been shown to perform well, they suffer from computational complexity, and their interdependencies negatively affect model performance. This study proposes the FS‐integrated classifier optimisation algorithm that incorporates FS during CO, enhances optimisation, and tackles the interdependency problem. Furthermore, since this algorithm does not use an iterative feature selection process, such as forward selection or backward elimination, it provides relatively less complexity than other CDFS techniques. Moreover, an application of the proposed methodology (NIDS) was implemented using the designed framework to validate the model in this problem domain. The proposed methodology achieved accuracies of 85.10%, 73.24% with one feature for the NSLKDD datasets, 83.45% with 32 features for the UNSW‐NB15 dataset, 99.41% with eight features, and 99.63% with 16 features for the CIC‐IDS2017 datasets. The results showed that the FS‐integrated optimisation algorithm had improved the accuracy of the classifier with fewer features. Furthermore, the proposed methodology outperformed other FS, ensemble learning, and deep learning‐based methods regarding detection accuracy and false alarm rate. In conclusion, the developed NIDS is an accurate, efficient, straightforward, feasible, and easy‐to‐implement system that can be created using limited computing power and time as a promising solution to protect traditional and modern computer networks.
Ensemble feature selection (EFS) is a valuable technique for developing accurate and robust machine-learning (ML) models. Data variation plays a crucial role in the success of EFS models; however, it also causes some outliers in the ranked lists. In this study, we proposed the minimum weight threshold method-based EFS (MWT-EFS) to address the outlier problem and use the true power of EFS. The proposed method employs the support vector classifier to assign weights for features, and the MWT method handles outliers in the ranked feature lists while creating the ensemble list. First, a threshold value is determined. After that, the feature weights below the threshold are replaced with this value. This approach eliminates the negative effect of outliers. After the new feature weights are assigned, the average of the feature weights is calculated (mean aggregation) for all features, and the ensemble (final) feature list is created accordingly. The experiment results showed that the proposed method significantly improves gene selection stability while maintaining classification performance and reducing computational complexity. In conclusion, the proposed method led to an accurate and robust classification that can help domain experts to make 1616
SummaryWith digitization and modern network applications, information security has gained a tremendous importance. Therefore, accurate and efficient detection systems are crucial for maintaining proactive security in computer networks. Machine learning (ML) has shown great potential as a promising solution since it can teach a machine to distinguish malicious and normal network activities. However, recently proposed methods are suffering from at least one of the following: detection accuracy, false alarm rate, and computational complexity issues. The main reason behind this problem is the complexity of the model in terms of attack types. From the ML perspective, intrusion detection is a classification problem where each attack type is identified by a set of different features, and features are used for classifying network activities. Thus, training an ML algorithm to detect more than one attack type leads to a more complex model; the increasing number of used features contributes positively to the model complexity, and may result in relatively lower detection accuracy or a higher false positive rate. To tackle this problem, this study proposes an attack‐wise customized network intrusion detection system (AWC‐NIDS) based on ML, concurrency, and distributed systems to achieve accurate and efficient network‐wide intrusion detection. Since CICIDS2017 contains many modern attacks, it was used for model development and performance evaluation. The experimental results showed that the proposed methodology achieved high classification performance for all datasets with a small number of features. However, it was observed that the lowest accuracy was achieved for the comprehensive dataset (which contains all attack types); for the single attack‐type datasets, the obtained accuracy was above 99%. This finding proves the concept of attack‐wise customization for intrusion detection and shows the significance of the proposed methodology. In conclusion, this framework is promising for implementing robust and accurate cybersecurity systems for traditional and modern networking.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.