2020
DOI: 10.1109/access.2020.2977591
|View full text |Cite
|
Sign up to set email alerts
|

How Much Training Data is Enough? A Case Study for HTTP Anomaly-Based Intrusion Detection

Abstract: Most anomaly-based intrusion detectors rely on models that learn from training datasets whose quality is crucial in their performance. Albeit the properties of suitable datasets have been formulated, the influence of the dataset size on the performance of the anomaly-based detector has received scarce attention so far. In this work, we investigate the optimal size of a training dataset. This size should be large enough so that training data is representative of normal behavior, but after that point, collecting… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 9 publications
(3 citation statements)
references
References 34 publications
0
3
0
Order By: Relevance
“…A prerequisite is to collect enough volume of events from the operational environment [37]. The size of the collected dataset will have implications in the time spent in the tuning process since we will need to manually review some of the alerts generated.…”
Section: Methods For Reducing the False Alarm Ratementioning
confidence: 99%
See 1 more Smart Citation
“…A prerequisite is to collect enough volume of events from the operational environment [37]. The size of the collected dataset will have implications in the time spent in the tuning process since we will need to manually review some of the alerts generated.…”
Section: Methods For Reducing the False Alarm Ratementioning
confidence: 99%
“…In a way, this can be viewed as a type of voting scheme. In a previous work [37], we combined ModSecurity [38] and Snort [39] to sanitize HTTP traces by combining the output of both detectors (∪ operation). Sanitization, however, is meant to maximize TP but not to minimize FP.…”
Section: Related Workmentioning
confidence: 99%
“…Using the sensors [2] and different IoT devices. Moreover, the security in big data is very challenging as it is concerned with attacks [3] that can originate either from online or offline spheres, hence, we can collect data and store the data using different protocols such as hypertext transfer protocol (HTTP) [4], message queue telemetry transport protocol (MQTT) [5], and constrained application protocol (CoAP) [6]. The data is stored so it can be utilized for the improvement of the device.…”
Section: Introductionmentioning
confidence: 99%