Resampling imbalanced data for network intrusion detection datasets

Bagui, Sikha; Li, Kunqi

doi:10.1186/s40537-020-00390-x

Cited by 156 publications

(85 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…While the model proposed by Koroniotis et al (2017) achieved 93.23% accuracy using DT classifier. In addition, none of the studies listed in Table 1 have resolved the class imbalance problem of the UNSW-NB15 dataset as there are many studies ( Al-Daweri et al, 2020 ; Ahmad et al, 2021 ; Bagui & Li, 2021 ; Dlamini & Fahim, 2021 ) that have highlighted this issue. We addressed the class imbalance problem by applying SMOTE that improved the performance of the classifiers and achieved good results.…”

Section: Discussionmentioning

confidence: 99%

Network intrusion detection using oversampling technique and machine learning algorithms

Ahmed¹,

Hameed²,

Bawany³

2022

PeerJ Computer Science

View full text Add to dashboard Cite

The expeditious growth of the World Wide Web and the rampant flow of network traffic have resulted in a continuous increase of network security threats. Cyber attackers seek to exploit vulnerabilities in network architecture to steal valuable information or disrupt computer resources. Network Intrusion Detection System (NIDS) is used to effectively detect various attacks, thus providing timely protection to network resources from these attacks. To implement NIDS, a stream of supervised and unsupervised machine learning approaches is applied to detect irregularities in network traffic and to address network security issues. Such NIDSs are trained using various datasets that include attack traces. However, due to the advancement in modern-day attacks, these systems are unable to detect the emerging threats. Therefore, NIDS needs to be trained and developed with a modern comprehensive dataset which contains contemporary common and attack activities. This paper presents a framework in which different machine learning classification schemes are employed to detect various types of network attack categories. Five machine learning algorithms: Random Forest, Decision Tree, Logistic Regression, K-Nearest Neighbors and Artificial Neural Networks, are used for attack detection. This study uses a dataset published by the University of New South Wales (UNSW-NB15), a relatively new dataset that contains a large amount of network traffic data with nine categories of network attacks. The results show that the classification models achieved the highest accuracy of 89.29% by applying the Random Forest algorithm. Further improvement in the accuracy of classification models is observed when Synthetic Minority Oversampling Technique (SMOTE) is applied to address the class imbalance problem. After applying the SMOTE, the Random Forest classifier showed an accuracy of 95.1% with 24 selected features from the Principal Component Analysis method.

show abstract

Section: Discussionmentioning

confidence: 99%

Network intrusion detection using oversampling technique and machine learning algorithms

Ahmed¹,

Hameed²,

Bawany³

2022

PeerJ Computer Science

View full text Add to dashboard Cite

show abstract

“…Pahl et al [15] describe in their paper that even though there is some related work found in IoT, still it is attracting the attention of researchers for its popularity in today's livelihood and designed an anomaly-based detector and firewall for IoT system using K-Means and BIRCH clustering with a predictive accuracy of 96.3%. Brun et al [16] Bagui and Li [26] presented the usefulness of random oversampling and random under-sampling with SMOTE to deal with highly imbalanced and less imbalanced intrusion detection datasets to improve the classification accuracy of the classifier. They evaluated the model by using artificial neural network (ANN) for attack detection using macro precision; macro recall, and macro F1 score for several sampling techniques and found that SMOTE-Random under-sampling with ANN classifier outperforms all others with 83.61% macro precision; 87.14% macro recall; 82.75% macro F1-score with 342 seconds training time.…”

Section: Related Researchmentioning

confidence: 99%

Developing an Efficient Feature Engineering and Machine Learning Model for Detecting IoT-Botnet Cyber Attacks

2021

View full text Add to dashboard Cite

The proliferation of Internet of Things (IoT) systems and smart digital devices, has perceived them targeted by network attacks. Botnets are vectors buttoned up which the attackers grapple the control of IoT systems and comportment venomous activities. To confront this challenge, efficient machine learning and deep learning with suitable feature engineering are suggested to detect and protect the network from such vulnerabilities in the future. For the efficient detection of cyber attacks, the representative dataset shall be well-structured for training the model and then validating the proposed system to develop an optimal security model. In this research, we used the UNSW-NB15, a new IoT-Botnet dataset (a noisy and imbalanced dataset) to classify cyber-attacks. K-Medoid sampling and scatter search-based feature engineering techniques are used to obtain a representative dataset with optimal feature subsets. To validate the proposed methodologies, three most recent machine learning (ML) methods including (i) JChaid*-a recent upgrade version to Chi-square automatic interaction detection (CHAID) decision tree-based, (ii) A2DE (a semi-naive Bayesian averaged two-dependence estimator), & (iii) HGC-a hybrid of Genetic algorithm with K-means clustering and two deep learning (DL) methods such as (i) Deep Multilayer perceptron (DMLP) & (ii) Convolutional neural network (CNN) based classifiers are employed. From the extensive experimental analysis, it is pronounced that scatter search-based DMLP classifier outperforms the other competing models in terms of (i) highest detection rate with100% accuracy, 100% macro-averaged precision, 100% macro-averaged recall & 100% macro-averaged F1-score and (ii) low computational complexity with the least training time of 4.7 seconds & testing time of 0.61 seconds.

show abstract

“…Liaqat et al [41] used the up-sampling method to increase the number of benign samples in the training data set. In [42][43][44][45], Synthetic Minority Oversampling Technique (SMOTE) method was used to generate additional samples for the minority classes. Mulyanto et al [46] performed feature selection to reduce dimensionality while focal loss function was used to address class imbalance problem.…”

Section: Review Of Related Workmentioning

confidence: 99%

“…Recent studies recommended SMOTE as an efficient over-sampling method [42][43][44][45]47,51]. Therefore, SMOTE algorithm was proposed to deal with the high class imbalance problem in the training set in an 11-class classification scenario.…”

Section: Synthetic Minority Oversampling Techniquementioning

confidence: 99%

Memory-Efficient Deep Learning for Botnet Attack Detection in IoT Networks

et al. 2021

View full text Add to dashboard Cite

Cyber attackers exploit a network of compromised computing devices, known as a botnet, to attack Internet-of-Things (IoT) networks. Recent research works have recommended the use of Deep Recurrent Neural Network (DRNN) for botnet attack detection in IoT networks. However, for high feature dimensionality in the training data, high network bandwidth and a large memory space will be needed to transmit and store the data, respectively in IoT back-end server or cloud platform for Deep Learning (DL). Furthermore, given highly imbalanced network traffic data, the DRNN model produces low classification performance in minority classes. In this paper, we exploit the joint advantages of Long Short-Term Memory Autoencoder (LAE), Synthetic Minority Oversampling Technique (SMOTE), and DRNN to develop a memory-efficient DL method, named LS-DRNN. The effectiveness of this method is evaluated with the Bot-IoT dataset. Results show that the LAE method reduced the dimensionality of network traffic features in the training set from 37 to 10, and this consequently reduced the memory space required for data storage by 86.49%. SMOTE method helped the LS-DRNN model to achieve high classification performance in minority classes, and the overall detection rate increased by 10.94%. Furthermore, the LS-DRNN model outperformed state-of-the-art models.

show abstract

Resampling imbalanced data for network intrusion detection datasets

Cited by 156 publications

References 25 publications

Network intrusion detection using oversampling technique and machine learning algorithms

Network intrusion detection using oversampling technique and machine learning algorithms

Developing an Efficient Feature Engineering and Machine Learning Model for Detecting IoT-Botnet Cyber Attacks

Memory-Efficient Deep Learning for Botnet Attack Detection in IoT Networks

Contact Info

Product

Resources

About