Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
Due to the growing rise of cyber attacks in the Internet, the demand of accurate intrusion detection systems (IDS) to prevent these vulnerabilities is increasing. To this aim, Machine Learning (ML) components have been proposed as an efficient and effective solution. However, its applicability scope is limited by two important issues: (i) the shortage of network traffic data datasets for attack analysis, and (ii) the data privacy constraints of the data to be used. To overcome these problems, Generative Adversarial Networks (GANs) have been proposed for synthetic flow-based network traffic generation. However, due to the ill-convergence of the GAN training, none of the existing solutions can generate high-quality fully synthetic data that can totally substitute real data in the training of ML components. In contrast, they mix real with synthetic data, which acts only as data augmentation components, leading to privacy breaches as real data is used. In sharp contrast, in this work we propose a novel and deterministic way to measure the quality of the synthetic data produced by a GAN both with respect to the real data and to its performance when used for ML tasks. As a by-product, we present a heuristic that uses these metrics for selecting the best performing generator during GAN training, leading to a novel stopping criterion, which can be applied even when different types of synthetic data are to be used in the same ML task. We demonstrate the adequacy of our proposal by generating synthetic cryptomining attacks and normal traffic flow-based data using an enhanced version of a Wasserstein GAN. The results evidence that the generated synthetic network traffic can completely replace real data when training a ML-based cryptomining detector, obtaining similar performance and avoiding privacy violations, since real data is not used in the training of the ML-based detector.
Due to the growing rise of cyber attacks in the Internet, the demand of accurate intrusion detection systems (IDS) to prevent these vulnerabilities is increasing. To this aim, Machine Learning (ML) components have been proposed as an efficient and effective solution. However, its applicability scope is limited by two important issues: (i) the shortage of network traffic data datasets for attack analysis, and (ii) the data privacy constraints of the data to be used. To overcome these problems, Generative Adversarial Networks (GANs) have been proposed for synthetic flow-based network traffic generation. However, due to the ill-convergence of the GAN training, none of the existing solutions can generate high-quality fully synthetic data that can totally substitute real data in the training of ML components. In contrast, they mix real with synthetic data, which acts only as data augmentation components, leading to privacy breaches as real data is used. In sharp contrast, in this work we propose a novel and deterministic way to measure the quality of the synthetic data produced by a GAN both with respect to the real data and to its performance when used for ML tasks. As a by-product, we present a heuristic that uses these metrics for selecting the best performing generator during GAN training, leading to a novel stopping criterion, which can be applied even when different types of synthetic data are to be used in the same ML task. We demonstrate the adequacy of our proposal by generating synthetic cryptomining attacks and normal traffic flow-based data using an enhanced version of a Wasserstein GAN. The results evidence that the generated synthetic network traffic can completely replace real data when training a ML-based cryptomining detector, obtaining similar performance and avoiding privacy violations, since real data is not used in the training of the ML-based detector.
Cybercrime has become more pervasive and sophisticated over the years. Cyber ranges have emerged as a solution to keep pace with the rapid evolution of cybersecurity threats and attacks. Cyber ranges have evolved to virtual environments that allow various IT and network infrastructures to be simulated to conduct cybersecurity exercises in a secure, flexible, and scalable manner. With these training environments, organizations or individuals can increase their preparedness and proficiency in cybersecurity-related tasks while helping to maintain a high level of situational awareness. SPIDER is an innovative cyber range as a Service (CRaaS) platform for 5G networks that offer infrastructure emulation, training, and decision support for cybersecurity-related tasks. In this paper, we present the integration in SPIDER of defensive exercises based on the utilization of machine learning models as key components of attack detectors. Two recently appeared network attacks, cryptomining using botnets of compromised devices and vulnerability exploit of the DoH protocol (DNS over HTTP), are used as the support use cases for the proposed exercises in order to exemplify the way in which other attacks and the corresponding ML-based detectors can be integrated into SPIDER defensive exercises. The two attacks were emulated, respectively, to appear in the control and data planes of a 5G network. The exercises use realistic 5G network traffic generated in a new environment based on a fully virtualized 5G network. We provide an in-depth explanation of the integration and deployment of these exercises and a complete walkthrough of them and their results. The machine learning models that act as attack detectors are deployed using container technology and standard interfaces in a new component called Smart Traffic Analyzer (STA). We propose a solution to integrate STAs in a standardized way in SPIDER for the use of trainees in exercises. Finally, this work proposes the application of Generative Adversarial Networks (GANs) to obtain on-demand synthetic flow-based network traffic that can be seamlessly integrated into SPIDER exercises to be used instead of real traffic and attacks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.