Casting out Demons: Sanitizing Training Data for Anomaly Sensors

Cretu, Gabriela F.; Stavrou, Angelos; Locasto, Michael E.; Stolfo, Salvatore J.; Keromytis, Angelos D.

doi:10.1109/sp.2008.11

Cited by 165 publications

(133 citation statements)

References 22 publications

(25 reference statements)

Supporting

Mentioning

132

Contrasting

Order By: Relevance

“…Averaged over eight weeks both sites keep over 40% of bits in common while in the three week run this is closer to 50%. This reinforces existing work [5] showing that traffic patterns do evolve over time indicating that updating normal models periodically should increase effectiveness. With our three week data set, we also have an additional web server from one administrative domain.…”

Section: Model Comparisonsupporting

confidence: 87%

“…Alexsander Lazarevic et al compares several AD systems in Network Intrusion Detection [12]. For our analysis, we use the STAND [5] method and Anagram [30] CAD sensor as our base CAD system. The STAND process shows improved results for CAD sensors by introducing a sanitization phase to scrub training data.…”

Section: Related Workmentioning

confidence: 99%

“…In our experiments, we leverage the STAND [5] optimizations of the Anagram [30] CAD sensor although any CAD sensor with a high detection rate could be used with our approach. However, we apply the CAD sensors on normalized input instead of full packet content as they originally operated on in order to obtain more accurate results.…”

Section: Content Anomaly Detector and Modelsmentioning

confidence: 99%

“…This provides space efficiency and incredible speed suitable for high speed networks since adding an element or checking if one is already present are constant time operations. Each normalized content is spilt into 5-gram sections as in [5] using a sliding window of five characters. See Figure 3(a) for an example.…”

Section: Content Anomaly Detector and Modelsmentioning

confidence: 99%

See 3 more Smart Citations

Cross-Domain Collaborative Anomaly Detection: So Far Yet So Close

Boggs

Hiremagalore²,

Stavrou³

et al. 2011

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Abstract. Web applications have emerged as the primary means of access to vital and sensitive services such as online payment systems and databases storing personally identifiable information. Unfortunately, the need for ubiquitous and often anonymous access exposes web servers to adversaries. Indeed, network-borne zero-day attacks pose a critical and widespread threat to web servers that cannot be mitigated by the use of signature-based intrusion detection systems. To detect previously unseen attacks, we correlate web requests containing user submitted content across multiple web servers that is deemed abnormal by local Content Anomaly Detection (CAD) sensors. The cross-site information exchange happens in real-time leveraging privacy preserving data structures. We filter out high entropy and rarely seen legitimate requests reducing the amount of data and time an operator has to spend sifting through alerts. Our results come from a fully working prototype using eleven weeks of real-world data from production web servers. During that period, we identify at least three application-specific attacks not belonging to an existing class of web attacks as well as a wide-range of traditional classes of attacks including SQL injection, directory traversal, and code inclusion without using human specified knowledge or input.

show abstract

Section: Model Comparisonsupporting

confidence: 87%

Section: Related Workmentioning

confidence: 99%

Section: Content Anomaly Detector and Modelsmentioning

confidence: 99%

Section: Content Anomaly Detector and Modelsmentioning

confidence: 99%

See 2 more Smart Citations

Cross-Domain Collaborative Anomaly Detection: So Far Yet So Close

Boggs

Hiremagalore²,

Stavrou³

et al. 2011

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…To significantly compromise the training phase of a learning algorithm, an attack has to be exhibit some characteristics that are different from those shown by the rest of the training data, otherwise it would have no impact at all. Therefore, most of the training attacks can be regarded as outliers, and countered either by data sanitization (i.e., outlier detection) [28] or by exploiting robust statistics [40,53] to mitigate the outliers' impact on learning (e.g., robust principal component analysis [66,29]). Notably, in [27] the robustness of SVMs to training data contamination has been formally analyzed under the framework of Robust Statistics [40,53], highlighting that bounded kernels and bounded loss functions may significantly limit the outliers' impact on classifier training.…”

Section: Proactive Defensesmentioning

confidence: 99%

Pattern Recognition Systems under Attack

Roli

Biggio

Fumera

2013

Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications

View full text Add to dashboard Cite

We analyze the problem of designing pattern recognition systems in adversarial settings, under an engineering viewpoint, motivated by their increasing exploitation in security-sensitive applications like spam and malware detection, despite their vulnerability to potential attacks has not yet been deeply understood. We first review previous work and report examples of how a complex system may be evaded either by leveraging on trivial vulnerabilities of its untrained components, e.g., parsing errors in the pre-processing steps, or by exploiting more subtle vulnerabilities of learning algorithms. We then discuss the need of exploiting both reactive and proactive security paradigms complementarily to improve the security by design. Our ultimate goal is to provide some useful guidelines for improving the security of pattern recognition in adversarial settings, and to suggest related open issues to foster research in this area.

show abstract

Privacy preserving and secure robust federated learning: A survey

Han,

Lu,

Wang

et al. 2024

Concurrency and Computation

View full text Add to dashboard Cite

SummaryFederated learning (FL) has emerged as a promising solution to address the challenges posed by data silos and the need for global data fusion. It offers a distributed machine learning framework with privacy‐preserving features, allowing model training without the need to collect user data. However, FL also presents significant security and privacy threats that hinder its widespread adoption. The requirements of privacy and security in FL are inherently conflicting. Privacy necessitates the concealment of individual client updates, while security requires the disclosure of client updates to detect anomalies. While most existing research focused on the privacy and security aspects of FL, very few studies have addressed the compatibility of these two demands. In this work, we aim to bridge this gap by proposing a comprehensive defense scheme that ensures privacy, security, and compatibility in FL. We categorize the existing literature into two key directions: privacy defense and security defense. Privacy defense includes methods based on additive masks, differential privacy, homomorphic encryption, and trusted execution environment, whereas security defense encompasses distance‐, performance‐, clustering‐, and similarity‐based anomaly detection techniques and statistical information‐based anomaly update bypassing techniques when the server is trusted and privacy‐compatible anomaly update detection techniques when the server is not trusted. In addition, this article presents decentralized FL solutions based on blockchain. For each direction, we discuss specific technical solutions, their advantages, and disadvantages. By evaluating various defense methods, we identify the most suitable approach to address the primary challenge of “achieving a secure and robust FL system against malicious adversaries while protecting users' privacy.” We then propose a theoretical reference framework for end‐to‐end protection of privacy and security in FL for the key problem, which summarizes the attack surface of FL systems from the client to the server under the security model where the client and server are malicious. Leveraging the strengths and characteristics of existing schemes, our proposed framework integrates multiple techniques to strike a balance between privacy, usability, and efficiency. This framework serves as a valuable reference and provides insights for future work in the field. Finally, we also provide recommendations for future research directions in this field.

show abstract

Casting out Demons: Sanitizing Training Data for Anomaly Sensors

Cited by 165 publications

References 22 publications

Cross-Domain Collaborative Anomaly Detection: So Far Yet So Close

Cross-Domain Collaborative Anomaly Detection: So Far Yet So Close

Pattern Recognition Systems under Attack

Privacy preserving and secure robust federated learning: A survey

Contact Info

Product

Resources

About