2022
DOI: 10.48550/arxiv.2209.14013
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On the Robustness of Ensemble-Based Machine Learning Against Data Poisoning

Abstract: Machine learning is becoming ubiquitous. From financial to medicine, machine learning models are boosting decisionmaking processes and even outperforming humans in some tasks. This huge progress in terms of prediction quality does not however find a counterpart in the security of such models and corresponding predictions, where perturbations of fractions of the training set (poisoning) can seriously undermine the model accuracy. Research on poisoning attacks and defenses even predates the introduction of deep … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(6 citation statements)
references
References 23 publications
0
6
0
Order By: Relevance
“…Paper [23] introduced several Command & Control (C&C) detection systems to detect sophisticated attacks and reduce false positive rates, however, as network size increases, it is a challenging task to store all network traffic yet, it is an essential requirement of most C&C detection methods to store traffic data. While in paper [24], the aim is to detect poisoned First DRAFT Version Linear Classifier MNIST [25], Spambase [26] and BreastCancer [27] -Effective Against Label Flipping Attacks -Sensitivity to parameters -Applicability in Various Scenarios -Scalability [28] Linear Classifier MNIST [25], Spambase [26] and BreastCancer [27] -Detects attack points and Outlier elimination -Computationally intensive and requires outlier estimation [29] Ensemble Trees UCI ML Repository [30] & KEEL-dataset Repository [31] -Achieves high detection accuracy and handles multiple attack types -Does not locate attacked points and requires untainted data for training [10] Ensemble Trees HAR dataset [32] -Recovers poisoned data, and increases accuracy -Limited effectiveness [33] RF Musk2 [34] and Android malware -Acheived high accuracy under other perturbations and scalable approach -Ensemble size must be considered and dependent on adversary knowledge [35] AdaBoost Spambase [26], Breast-w [36], Kr-vs-kp [37], and so forth labels through applying the K-NN approach and mitigate the impact of LF attacks through label sanitization, however, it assumes a large number of benign samples are available for sanitization, which may reduce classifier accuracy. Another paper suggested to use of an outlier detection-based scheme to identify attack points targeting linear classifiers [28].…”
Section: Related Workmentioning
confidence: 99%
“…Paper [23] introduced several Command & Control (C&C) detection systems to detect sophisticated attacks and reduce false positive rates, however, as network size increases, it is a challenging task to store all network traffic yet, it is an essential requirement of most C&C detection methods to store traffic data. While in paper [24], the aim is to detect poisoned First DRAFT Version Linear Classifier MNIST [25], Spambase [26] and BreastCancer [27] -Effective Against Label Flipping Attacks -Sensitivity to parameters -Applicability in Various Scenarios -Scalability [28] Linear Classifier MNIST [25], Spambase [26] and BreastCancer [27] -Detects attack points and Outlier elimination -Computationally intensive and requires outlier estimation [29] Ensemble Trees UCI ML Repository [30] & KEEL-dataset Repository [31] -Achieves high detection accuracy and handles multiple attack types -Does not locate attacked points and requires untainted data for training [10] Ensemble Trees HAR dataset [32] -Recovers poisoned data, and increases accuracy -Limited effectiveness [33] RF Musk2 [34] and Android malware -Acheived high accuracy under other perturbations and scalable approach -Ensemble size must be considered and dependent on adversary knowledge [35] AdaBoost Spambase [26], Breast-w [36], Kr-vs-kp [37], and so forth labels through applying the K-NN approach and mitigate the impact of LF attacks through label sanitization, however, it assumes a large number of benign samples are available for sanitization, which may reduce classifier accuracy. Another paper suggested to use of an outlier detection-based scheme to identify attack points targeting linear classifiers [28].…”
Section: Related Workmentioning
confidence: 99%
“…A first step [7] in this direction has been done taking two properties as the reference: explainability and robustness [8], [9], [10]. Explainability aims to solve the intrinsic problems of non-determinism and opaqueness of ML models and their operations, which make ML adoption limited in critical domains.…”
Section: Challenges In ML Certificationmentioning
confidence: 99%
“…For instance, training an ensemble of base (e.g., random forest) models instead of the base model can have an impact on property robustness [9], [14], increasing the strength of the training process and compensating weaknesses of the training set. CM Df includes a property p P f specific for training process, a target T oC P f modeling the training process, and an evidence collection model E P f including the procedure for collecting evidence on how the training process is designed and executed.…”
Section: Multi-factor Certification Of Ml-based Systemsmentioning
confidence: 99%
See 2 more Smart Citations