BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

Gu, Tianyu; Dolan-Gavitt, Brendan; Garg, Siddharth

doi:10.48550/arxiv.1708.06733

Cited by 435 publications

(988 citation statements)

References 22 publications

Supporting

Mentioning

907

Contrasting

Order By: Relevance

“…: Backdoor injection is an emerging attack that leaves backdoors into neural networks during the training process and tricks the trained model to conduct specific behaviors as the backdoor is triggered. In general, different attack methods specify different trigger patterns, which can be one single pixel [17], a tiny patch [9] or human imperceptible noises [18], [19]. This paper is proposed to defend against all kinds of attacks mentioned above.…”

Section: Related Workmentioning

confidence: 99%

“…Given the recovered trigger patterns, the next step of BAERASER is to erase them through machine unlearning (line [17][18][19]. The basic principle of trigger pattern unlearning is derived from the following observation about gradient descent based neural network learning.…”

Section: B Trigger Pattern Unlearningmentioning

confidence: 99%

“…Attacks. Similar to prior works [14], [27], [28], we examine the defense effect of the defense methods against different state-of-the-art backdoor injection attacks, i.e., BadNet [29], TrojanNN [9] and IMC [30], using commonly used pixel-level triggers with random positions and invisible noise triggers. These attacks cover the two mainstream attack categories, including attacking during the training stage (BadNet) and attacking after training (TrojanNN and IMC).…”

Section: Performance Evaluationmentioning

confidence: 99%

See 2 more Smart Citations

Backdoor Defense with Machine Unlearning

Liu¹,

Fan²,

Chen³

et al. 2022

Preprint

View full text Add to dashboard Cite

Backdoor injection attack is an emerging threat to the security of neural networks, however, there still exist limited effective defense methods against the attack. In this paper, we propose BAERASER, a novel method that can erase the backdoor injected into the victim model through machine unlearning. Specifically, BAERASER mainly implements backdoor defense in two key steps. First, trigger pattern recovery is conducted to extract the trigger patterns infected by the victim model. Here, the trigger pattern recovery problem is equivalent to the one of extracting an unknown noise distribution from the victim model, which can be easily resolved by the entropy maximization based generative model. Subsequently, BAERASER leverages these recovered trigger patterns to reverse the backdoor injection procedure and induce the victim model to erase the polluted memories through a newly designed gradient ascent based machine unlearning method. Compared with the previous machine unlearning solutions, the proposed approach gets rid of the reliance on the full access to training data for retraining and shows higher effectiveness on backdoor erasing than existing fine-tuning or pruning methods. Moreover, experiments show that BAERASER can averagely lower the attack success rates of three kinds of state-of-the-art backdoor attacks by 99% on four benchmark datasets.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: B Trigger Pattern Unlearningmentioning

confidence: 99%

Section: Performance Evaluationmentioning

confidence: 99%

See 1 more Smart Citation

Backdoor Defense with Machine Unlearning

Liu¹,

Fan²,

Chen³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Therefore, it is challenging to craft adversarial example patch to be robust in diverse real-world scenes. Distinct from the adversarial example attack, recently, there is a new backdoor attack being revealed, with nearly all studies are on classification tasks, especially image classifications [5]- [7]. A backdoored model behaves normally given normal inputs containing the attacker secretly-chosen trigger, but misbehaves as the attacker intends once the trigger is presented in the input.…”

Section: Introductionmentioning

confidence: 99%

Dangerous Cloaking: Natural Trigger based Backdoor Attacks on Object Detectors in the Physical World

Ma¹,

Yin-shan²,

Gao³

et al. 2022

Preprint

View full text Add to dashboard Cite

Deep learning (DL) models have been shown to be vulnerable to recent backdoor attacks. A backdoored model behaves normally for inputs containing no attacker-secretlychosen trigger and maliciously for inputs with the trigger. To date, backdoor attacks and countermeasures mainly focus on classification tasks, in particular, image classification. Most of these backdoor attacks have been implemented in the digital world with digital triggers. Besides the classification tasks, object detection systems are also designed to identify the location of an object, which is a fundamental basis for various computer vision tasks. However, there is no investigation and understanding of the backdoor vulnerability of the object detector, even in the digital world with digital triggers.For the first time, this work demonstrates that existing object detectors are inherently susceptible to physical backdoor attacks, thus revealing severe real-world security threats, e.g., when malicious cloaking is abused. We use a natural T-shirt bought from a market as a trigger to enable the cloaking effect-the person bounding-box disappears in front of the object detector. We show that such a backdoor can be implanted from two widely exploitable attack scenarios into the object detector, which is outsourced or fine-tuned through a pretrained model. We have extensively evaluated three popular object detection algorithms: anchor-based Yolo-V3, Yolo-V4, and anchor-free CenterNet. Building upon 19 videos (about 11,800 frames in total) shot in real-world scenes, we confirm that the backdoor attack is robust against various factors: movement, distance, angle, nonrigid deformation, and lighting. Specifically, the attack success rate (ASR) in most videos is 100% or close to it, while the clean data accuracy of the backdoored model is the same as its clean counterpart. The latter implies that it is infeasible to detect the backdoor behavior merely through a validation set. The averaged ASR still remains sufficiently high to be 78% in the transfer learning attack scenarios evaluated on CenterNet. The comprehensive demo video (up to 5 minutes) is available at https://youtu.be/Q3HOF4OobbY.

show abstract

“…However, this common practice raises a serious concern that the labeled data from the third parties can be backdoor attacked. Such an operation enables f to perform well on normal samples while behaving badly on samples with specifically designed patterns, leading to serious concerns to DNN (Gu et al, 2017;Li et al, 2020b).…”

Section: Introductionmentioning

confidence: 99%

Rethink the Evaluation for Attack Strength of Backdoor Attacks in Natural Language Processing

Shen¹,

Jiang²,

Liu³

et al. 2022

Preprint

View full text Add to dashboard Cite

Recently, it has been shown that natural language processing (NLP) models are vulnerable to a kind of security threat called the Backdoor Attack, which utilizes a 'backdoor trigger' paradigm to mislead the models. The most threatening backdoor attack is the stealthy backdoor, which defines the triggers as text style or syntactic. Although they have achieved an incredible high attack success rate (ASR), we find that the principal factor contributing to their ASR is not the 'backdoor trigger' paradigm. Thus the capacity of these stealthy backdoor attacks is overestimated when categorized as backdoor attacks. Therefore, to evaluate the real attack power of backdoor attacks, we propose a new metric called attack successful rate difference (ASRD), which measures the ASR difference between clean state and poison state models. Besides, since the defenses against stealthy backdoor attacks are absent, we propose Trigger Breaker, consisting of two too simple tricks that can defend against stealthy backdoor attacks effectively. Experiments on text classification tasks show that our method achieves significantly better performance than state-of-the-art defense methods against stealthy backdoor attacks.

show abstract

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

Cited by 435 publications

References 22 publications

Backdoor Defense with Machine Unlearning

Backdoor Defense with Machine Unlearning

Dangerous Cloaking: Natural Trigger based Backdoor Attacks on Object Detectors in the Physical World

Rethink the Evaluation for Attack Strength of Backdoor Attacks in Natural Language Processing

Contact Info

Product

Resources

About