2008
DOI: 10.1109/sp.2008.11
|View full text |Cite
|
Sign up to set email alerts
|

Casting out Demons: Sanitizing Training Data for Anomaly Sensors

Abstract: The efficacy of Anomaly Detection (AD) sensors depends heavily on the quality of the data used to train them. Artificial or contrived training data may not provide a realistic view of the deployment environment. Most realistic data sets are dirty; that is, they contain a number of attacks or anomalous events. The size of these high-quality training data sets makes manual removal or labeling of attack data infeasible. As a result, sensors trained on this data can miss attacks and their variations. We propose ex… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
132
0

Year Published

2009
2009
2024
2024

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 165 publications
(133 citation statements)
references
References 22 publications
(25 reference statements)
1
132
0
Order By: Relevance
“…Averaged over eight weeks both sites keep over 40% of bits in common while in the three week run this is closer to 50%. This reinforces existing work [5] showing that traffic patterns do evolve over time indicating that updating normal models periodically should increase effectiveness. With our three week data set, we also have an additional web server from one administrative domain.…”
Section: Model Comparisonsupporting
confidence: 87%
See 3 more Smart Citations
“…Averaged over eight weeks both sites keep over 40% of bits in common while in the three week run this is closer to 50%. This reinforces existing work [5] showing that traffic patterns do evolve over time indicating that updating normal models periodically should increase effectiveness. With our three week data set, we also have an additional web server from one administrative domain.…”
Section: Model Comparisonsupporting
confidence: 87%
“…Alexsander Lazarevic et al compares several AD systems in Network Intrusion Detection [12]. For our analysis, we use the STAND [5] method and Anagram [30] CAD sensor as our base CAD system. The STAND process shows improved results for CAD sensors by introducing a sanitization phase to scrub training data.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…To significantly compromise the training phase of a learning algorithm, an attack has to be exhibit some characteristics that are different from those shown by the rest of the training data, otherwise it would have no impact at all. Therefore, most of the training attacks can be regarded as outliers, and countered either by data sanitization (i.e., outlier detection) [28] or by exploiting robust statistics [40,53] to mitigate the outliers' impact on learning (e.g., robust principal component analysis [66,29]). Notably, in [27] the robustness of SVMs to training data contamination has been formally analyzed under the framework of Robust Statistics [40,53], highlighting that bounded kernels and bounded loss functions may significantly limit the outliers' impact on classifier training.…”
Section: Proactive Defensesmentioning
confidence: 99%