2021 IEEE Security and Privacy Workshops (SPW) 2021
DOI: 10.1109/spw53761.2021.00009
|View full text |Cite
|
Sign up to set email alerts
|

Troubleshooting an Intrusion Detection Dataset: the CICIDS2017 Case Study

Abstract: Numerous research studies have demonstrated the effectiveness of machine learning techniques in application to network intrusion detection. And yet, the adoption of machine learning for securing large-scale network environments remains limited. The community acknowledges that network security presents unique challenges for machine learning, and the lack of training data representative of modern traffic remains one of the most intractable issues. New attempts are continuously made to develop high quality benchm… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
31
0
1

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 67 publications
(53 citation statements)
references
References 11 publications
0
31
0
1
Order By: Relevance
“…We assume that all the considered datasets are correctly labelled. However, such assumption may be overly optimistic: as described in §2, manual labelling is an error-prone task, and some recent papers highlighted that even well-known datasets may contain flaws (e.g., [151]). Due to the seminal nature of this SoK paper, we do not make any change to the provided labels-which also facilitates comparisons with previous works using such datasets, as the ground truth is the same.…”
Section: Discussion and Future Workmentioning
confidence: 99%
“…We assume that all the considered datasets are correctly labelled. However, such assumption may be overly optimistic: as described in §2, manual labelling is an error-prone task, and some recent papers highlighted that even well-known datasets may contain flaws (e.g., [151]). Due to the seminal nature of this SoK paper, we do not make any change to the provided labels-which also facilitates comparisons with previous works using such datasets, as the ground truth is the same.…”
Section: Discussion and Future Workmentioning
confidence: 99%
“…Despite the recent Grammatopoulos, and Fabio Di Franco interest in ML led to the release of more open datasets (e.g., [10,31,142]), such datasets exhibit limitations [163]. For instance, inaccurate labels, fast obsolescence, small and synthetic environments (e.g., [115]), or even flawed generation process-as shown in [66]. All these problems can only be mitigated to some degree (e.g.…”
Section: Data Availability (Executives and Legislation Authorities)mentioning
confidence: 99%
“…Many of the defined requirements for testbeds also overlap with the criteria that network security datasets should meet [13,17,18]. The criteria can be summarized as follows: the dataset provides real and complete network traces; the traffic is generated using a valid network topology that includes clients, servers and network equipment; the dataset is labeled to distinguish between benign and malicious traces; highly heterogeneous regarding included services, network protocols, normal and attack behaviors; easily extendable; reproducible; shareable and documented.…”
Section: General Testbed and Dataset Requirementsmentioning
confidence: 99%
“…Besides offering different types of attacks, diversity within the same attack type should also be provided. This procedure includes combining different attack tools that perform similar actions and using multiple options and flags for each attack [17].…”
Section: Heterogeneitymentioning
confidence: 99%
See 1 more Smart Citation