Using machine learning techniques to identify rare cyber‐attacks on the UNSW‐NB15 dataset

Bagui, Sikha; Kalaimannan, Ezhil; Nandi, Debarghya; Pinto, Anthony

doi:10.1002/spy2.91

Cited by 49 publications

(32 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Cybersecurity data analysis is very important to indicate vulnerabilities and unveil security breaches, such as via detecting network inconsistencies. Examples of applying feature selection before anomaly detection include a hierarchical feature selection for DDoS mitigation [71], an ensemble feature selection for intrusion detection [72], and clustering and correlation-based feature selection for intrusion detection [73]. Hence, our proposed cooperative co-evolutionbased feature selection with the proposed random feature grouping (CCFSRFG) can be applied in any domain of Big Data.…”

Section: Performance Evaluation Of Classifiers With Ccfsrfgmentioning

confidence: 99%

Cooperative co-evolution for feature selection in Big Data with random feature grouping

et al. 2020

View full text Add to dashboard Cite

A massive amount of data is generated with the evolution of modern technologies. This high-throughput data generation results in Big Data, which consist of many features (attributes). However, irrelevant features may degrade the classification performance of machine learning (ML) algorithms. Feature selection (FS) is a technique used to select a subset of relevant features that represent the dataset. Evolutionary algorithms (EAs) are widely used search strategies in this domain. A variant of EAs, called cooperative co-evolution (CC), which uses a divide-and-conquer approach, is a good choice for optimization problems. The existing solutions have poor performance because of some limitations, such as not considering feature interactions, dealing with only an even number of features, and decomposing the dataset statically. In this paper, a novel random feature grouping (RFG) has been introduced with its three variants to dynamically decompose Big Data datasets and to ensure the probability of grouping interacting features into the same subcomponent. RFG can be used in CC-based FS processes, hence called Cooperative Co-Evolutionary-Based Feature Selection with Random Feature Grouping (CCFSRFG). Experiment analysis was performed using six widely used ML classifiers on seven different datasets from the UCI ML repository and Princeton University Genomics repository with and without FS. The experimental results indicate that in most cases [i.e., with naïve Bayes (NB), support vector machine (SVM), k-Nearest Neighbor (k-NN), J48, and random forest (RF)] the proposed CCFSRFG-1 outperforms an existing solution (a CC-based FS, called CCEAFS) and CCFSRFG-2, and also when using all features in terms of accuracy, sensitivity, and specificity.

show abstract

Section: Performance Evaluation Of Classifiers With Ccfsrfgmentioning

confidence: 99%

Cooperative co-evolution for feature selection in Big Data with random feature grouping

et al. 2020

View full text Add to dashboard Cite

show abstract

“…Sikha Bagui et al proposed in their study [11] a method to detect cyber-attacks based on the Naïve Bayes and Decision Tree (J48) machine learning algorithms. The team [11] used these two algorithms in turn for classifying components of cyber-attacks in the UNSW-NB15 dataset.…”

Section: Related Workmentioning

confidence: 99%

Detecting Unauthorized Network Intrusion based on Network Traffic using Behavior Analysis Techniques

Lam¹

2021

IJACSA

View full text Add to dashboard Cite

Nowadays, network intrusion detection is an essential problem because cyber-attacks are increasing in both the number and extent of the danger. Network intrusion techniques often use various methods to bypass the oversight of anomaly detection and surveillance systems. This paper proposes to use behavior analysis techniques, machine learning, and deep learning algorithms for the task of detecting network intrusions. The practical and scientific significance of our paper includes two issues: (1) Regarding the process of selecting and extracting features: instead of using typical abnormal behaviors of attacks, this study will use statistical behaviors that are easy to calculate and extract while still ensuring the effectiveness of the method; (2) Regarding the detection process, this study proposes to use the Random Forest (RF) classification algorithm, the Multilayer Perceptron (MLP) and the Convolutional Neural Network (CNN) deep learning model. The experimental results in Section IV have proven that our proposal in this paper is completely correct and reasonable. Based on the results shown in Section IV, this study has provided network surveillance systems with a number of abnormal behaviors as the basis for detecting network intrusions.

show abstract

“…In this case, the data set UNSW-NB15 [39], [40], which is widely used in cybersecurity [41]- [44] and considered as a benchmark data set [45], was chosen. The choice of this data set is motivated by several factors: the validity of the attacks the labeling of these, and the classification of the data, similar to that presented in the previous section.…”

Section: Data Set Understudymentioning

confidence: 99%

“…Furthermore, some recent researches have studied the current datasets and most of them have done an evaluation of machine learning techniques such as [49], where the UNSW-NB15 was evaluated by different machine learning algorithms such as Decision Trees, Naïve Bayes and Support Vector Machine, obtaining the best accuracy by Decision Trees (C5.0) of 85.41%. Also, in [41] the authors present a feature selection for rare cyber-attacks, where they propose an evaluation of multiples algorithms with the objective to detect the best accuracy for multi class classification, obtaining an accuracy in the best case (for worms attacks) of 99.94%. Table 8 shows a representation of different related researches and their comparison with the results presented in this article.…”

Section: F Comparison Between Multiples Researches Approachesmentioning

confidence: 99%

Evaluation of Cybersecurity Data Set Characteristics for Their Applicability to Neural Networks Algorithms Detecting Cybersecurity Anomalies

et al. 2020

View full text Add to dashboard Cite

Artificial intelligence algorithms have a leading role in the field of cybersecurity and attack detection, being able to present better results in some scenarios than classic intrusion detection systems such as Snort or Suricata. In this sense, this research focuses on the evaluation of characteristics for different well-established Machine Leaning algorithms commonly applied to IDS scenarios. To do this, a categorization for cybersecurity data sets that groups its records into several groups is first considered. Making use of this division, this work seeks to determine which neural network model (multilayer or recurrent), activation function, and learning algorithm yield higher accuracy values, depending on the group of data. Finally, the results are used to determine which group of data from a cybersecurity data set are more relevant and representative for the intrusion detection, and the most suitable configuration of Machine Learning algorithm to decrease the computational load of the system.

show abstract

Using machine learning techniques to identify rare cyber‐attacks on the UNSW‐NB15 dataset

Cited by 49 publications

References 22 publications

Cooperative co-evolution for feature selection in Big Data with random feature grouping

Cooperative co-evolution for feature selection in Big Data with random feature grouping

Detecting Unauthorized Network Intrusion based on Network Traffic using Behavior Analysis Techniques

Evaluation of Cybersecurity Data Set Characteristics for Their Applicability to Neural Networks Algorithms Detecting Cybersecurity Anomalies

Contact Info

Product

Resources

About