A survey of network-based intrusion detection data sets

Ring, Markus; Wunderlich, Sarah; Scheuring, Deniz; Landes, Dieter; Hotho, Andreas

doi:10.1016/j.cose.2019.06.005

Cited by 465 publications

(200 citation statements)

References 76 publications

Supporting

Mentioning

199

Contrasting

Unclassified

Order By: Relevance

“…As a part of future work, it will be interesting to employ different intrusion detection datasets, subsequently gauge the performance of various classifiers. Experts have always urged the research community to experiment with different datasets and introduce novel techniques for network intrusion detection [33,34]. Another avenue which can be explored in future can possibly include the deployment of predictive models as scalable web services thereby leveraging the capabilities of MAMLS.…”

Section: Conclusion and Prospectsmentioning

confidence: 99%

Performance analysis of binary and multiclass models using azure machine learning

Rajagopal

Hareesha

Kundapur

2020

IJECE

View full text Add to dashboard Cite

Network data is expanding and that too at an alarming rate. Besides, the sophisticated attack tools used by hackers lead to capricious cyber threat landscape. Traditional models proposed in the field of network intrusion detection using machine learning algorithms emphasize more on improving attack detection rate and reducing false alarms but time efficiency is often overlooked. Therefore, in order to address this limitation, a modern solution has been presented using Machine Learning-as-a-Service platform. The proposed work analyses the performance of eight two-class and three multiclass algorithms using UNSW NB-15, a modern intrusion detection dataset. 82,332 testing samples were considered to evaluate the performance of algorithms. The proposed two class decision forest model exhibited 99.2% accuracy and took 6 seconds to learn 1,75,341 network instances. Multiclass classification task was also undertaken wherein attack types like generic, exploits, shellcode and worms were classified with a recall percentage of 99%, 94.49%, 91.79% and 90.9% respectively by the multiclass decision forest model that also leapfrogged others in terms of training and execution time.

show abstract

Section: Conclusion and Prospectsmentioning

confidence: 99%

Performance analysis of binary and multiclass models using azure machine learning

Rajagopal

Hareesha

Kundapur

2020

IJECE

View full text Add to dashboard Cite

show abstract

“…In the current environment of continually emerging new threats, building reliable and accurate IDS models requires using an up-to-date ID dataset. A number of modern datasets were proposed [27]- [29], Ring et al, [30] also recommended some selected few datasets suitable for general network intrusion detection evaluation. Both the proposed and recommended datasets are publicly available and can be used for building better and more reliable IDS models.…”

Section: Data Encodingmentioning

confidence: 99%

Effects of Feature Selection and Normalization on Network Intrusion Detection

Umar¹,

Chen²

2023

Preprint

View full text Add to dashboard Cite

<div><br></div><div><p> The rapid rise of cyberattacks and the gradual failing of traditional defense systems and approaches led to the use of Machine Learning (ML) techniques aiming to build more efficient and reliable Intrusion Detection Systems (IDSs). However, the advent of larger IDS datasets brought about negative impacts on the performance and computational time of ML-based IDSs. To overcome such issues, many researchers utilized data preprocessing techniques such as feature selection and normalization. While most of these researchers reported the success of these preprocessing techniques on a shallow level, very few studies are performed on their effects on a wider scale. Furthermore, the performance of an IDS model is subject to not only the preprocessing techniques used but also the dataset and the ML algorithm used, which most of the existing studies on preprocessing techniques give little emphasis on. Thus, this study provides an in-depth analysis of the effects of feature selection and normalization on various IDS models built using four separate IDS datasets and five different ML algorithms. Wrapper-based decision tree and min-max are used in feature selection and normalization respectively. The models are evaluated and compared using popular evaluation metrics in IDS. The study found normalization to be more important than feature selection in improving performance and computational time of models on both datasets, while feature selection on UNSW-NB15 failed to reduce models computational time, and in the case of models built using NSL-KDD, it decreases their performance. The study also reveals that, compared to the UNSW-NB15 dataset, the NSL-KDD dataset is less complex and unsuitable for building reliable modern-day IDS models. Furthermore, the best performance on both datasets is achieved by Random Forest with accuracy of 99.75% and 98.51% on NSL-KDD and UNSW-NB15 respectively. </p></div>

show abstract

“…However, it should be noted that the detection mechanisms of many IDS described earlier rely on the network traffic characteristics of the network and transport layers, without taking into account possible cyberattacks taking place at the application layer protocols (e.g., Modbus, DNP3). Moreover, it is worth noting that most of the anomaly-based IDS utilise outdated publicly available datasets, such as KDD CUP 1999 and NSL-KDD [ 40 , 41 ]. These datasets were not created, considering the unique attributes of an SG environment; therefore, the detection mechanisms based on them cannot be considered as reliable.…”

Section: Related Work and Contributionsmentioning

confidence: 99%

ARIES: A Novel Multivariate Intrusion Detection System for Smart Grid

Radoglou-Grammatikis

Sarigiannidis

Efstathopoulos³

et al. 2020

Sensors

View full text Add to dashboard Cite

The advent of the Smart Grid (SG) raises severe cybersecurity risks that can lead to devastating consequences. In this paper, we present a novel anomaly-based Intrusion Detection System (IDS), called ARIES (smArt gRid Intrusion dEtection System), which is capable of protecting efficiently SG communications. ARIES combines three detection layers that are devoted to recognising possible cyberattacks and anomalies against (a) network flows, (b) Modbus/Transmission Control Protocol (TCP) packets and (c) operational data. Each detection layer relies on a Machine Learning (ML) model trained using data originating from a power plant. In particular, the first layer (network flow-based detection) performs a supervised multiclass classification, recognising Denial of Service (DoS), brute force attacks, port scanning attacks and bots. The second layer (packet-based detection) detects possible anomalies related to the Modbus packets, while the third layer (operational data based detection) monitors and identifies anomalies upon operational data (i.e., time series electricity measurements). By emphasising on the third layer, the ARIES Generative Adversarial Network (ARIES GAN) with novel error minimisation functions was developed, considering mainly the reconstruction difference. Moreover, a novel reformed conditional input was suggested, consisting of random noise and the signal features at any given time instance. Based on the evaluation analysis, the proposed GAN network overcomes the efficacy of conventional ML methods in terms of Accuracy and the F1 score.

show abstract

A survey of network-based intrusion detection data sets

Cited by 465 publications

References 76 publications

Performance analysis of binary and multiclass models using azure machine learning

Performance analysis of binary and multiclass models using azure machine learning

Effects of Feature Selection and Normalization on Network Intrusion Detection

ARIES: A Novel Multivariate Intrusion Detection System for Smart Grid

Contact Info

Product

Resources

About