Neural Trojans

Liu, Yuntao; Xie, Yang; Srivastava, Ankur

doi:10.1109/iccd.2017.16

Cited by 223 publications

(178 citation statements)

References 20 publications

(31 reference statements)

Supporting

Mentioning

155

Contrasting

Order By: Relevance

“…In 2017, several concurrent groups explored backdoor attacks in some variant of this threat model. In addition to the three attacks described in detail in Section 2.3 [18,10,27], Muñoz-González et al [34] described a gradient-based method for producing poison data, and Liu et al [28] examine neural trojans on a toy MNIST example and evaluate several mitigation techniques. In the context of the taxonomy given by Barreno et al [7], these backdoor attacks can be classified as causative integrity attacks.…”

Section: Related Workmentioning

confidence: 99%

“…Similarly, in their NDSS 2017 paper, Liu et al [27] note that targeted backdoor attacks will disproportionately reduce the accuracy of the model on the targeted class, and suggest that this could be used as a detection technique. Finally, Liu et al's [28] mitigations have only been tested on the MNIST task, which is generally considered unrepresentative of real-world computer vision tasks [46]. Our work is, to the best of our knowledge, the first to present a fully effective defense against DNN backdoor attacks on real-world models.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks

Liu

Dolan-Gavitt

Garg

2018

Lecture Notes in Computer Science

612

554

View full text Add to dashboard Cite

Deep neural networks (DNNs) provide excellent performance across a wide range of classification tasks, but their training requires high computational resources and is often outsourced to third parties. Recent work has shown that outsourced training introduces the risk that a malicious trainer will return a backdoored DNN that behaves normally on most inputs but causes targeted misclassifications or degrades the accuracy of the network when a trigger known only to the attacker is present. In this paper, we provide the first effective defenses against backdoor attacks on DNNs. We implement three backdoor attacks from prior work and use them to investigate two promising defenses, pruning and fine-tuning. We show that neither, by itself, is sufficient to defend against sophisticated attackers. We then evaluate fine-pruning, a combination of pruning and fine-tuning, and show that it successfully weakens or even eliminates the backdoors, i.e., in some cases reducing the attack success rate to 0% with only a 0.4% drop in accuracy for clean (non-triggering) inputs. Our work provides the first step toward defenses against backdoor attacks in deep neural networks.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks

Liu

Dolan-Gavitt

Garg

2018

Lecture Notes in Computer Science

612

554

View full text Add to dashboard Cite

show abstract

“…In terms of defenses, Liu et al [31] only presented some brief intuitions on backdoor detection, while Chen et al [13] reported a number of ideas that are shown to be ineffective. Liu et al [32] proposed three defenses: input anomaly detection, re-training, and input preprocessing, and require the poisoned training data. A more recent work [49] leveraged trace in the spectrum of the covariance of a feature representation to detect backdoor.…”

Section: Related Workmentioning

confidence: 99%

Latent Backdoor Attacks on Deep Neural Networks

Yao

Zheng

et al. 2019

Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security

262

169

View full text Add to dashboard Cite

Recent work has proposed the concept of backdoor attacks on deep neural networks (DNNs), where misbehaviors are hidden inside "normal" models, only to be triggered by very specific inputs. In practice, however, these attacks are difficult to perform and highly constrained by sharing of models through transfer learning. Adversaries have a small window during which they must compromise the student model before it is deployed.In this paper, we describe a significantly more powerful variant of the backdoor attack, latent backdoors, where hidden rules can be embedded in a single "Teacher" model, and automatically inherited by all "Student" models through the transfer learning process. We show that latent backdoors can be quite effective in a variety of application contexts, and validate its practicality through real-world attacks against traffic sign recognition, iris identification of lab volunteers, and facial recognition of public figures (politicians). Finally, we evaluate 4 potential defenses, and find that only one is effective in disrupting latent backdoors, but might incur a cost in classification accuracy as tradeoff.

show abstract

“…Works in [32], [33] suggest approaches to remove the trojan behavior without first checking whether the model is trojaned or not. Fine-tuning is used to remove potential trojans by pruning carefully chosen parameters of the DNN model [32].…”

Section: B Defensesmentioning

confidence: 99%

“…It is also cumbersome to perform removal operations to any DNN model under deployment as most of them tend to be benign. Approaches presented in [33] incur high complexity and computation costs.…”

Section: B Defensesmentioning

confidence: 99%

Strip

Gao¹,

Wang

et al. 2019

Proceedings of the 35th Annual Computer Security Applications Conference

344

View full text Add to dashboard Cite

A recent trojan attack on deep neural network (DNN) models is one insidious variant of data poisoning attacks. Trojan attacks exploit an effective backdoor created in a DNN model by leveraging the difficulty in interpretability of the learned model to misclassify any inputs signed with the attacker's chosen trojan trigger. Since the trojan trigger is a secret guarded and exploited by the attacker, detecting such trojan inputs is a challenge, especially at run-time when models are in active operation. This work builds STRong Intentional Perturbation (STRIP) based run-time trojan attack detection system and focuses on vision system. We intentionally perturb the incoming input, for instance by superimposing various image patterns, and observe the randomness of predicted classes for perturbed inputs from a given deployed model-malicious or benign. A low entropy in predicted classes violates the input-dependence property of a benign model and implies the presence of a malicious input-a characteristic of a trojaned input. The high efficacy of our method is validated through case studies on three popular and contrasting datasets: MNIST, CIFAR10 and GTSRB. We achieve an overall false acceptance rate (FAR) of less than 1%, given a preset false rejection rate (FRR) of 1%, for different types of triggers. Using CIFAR10 and GTSRB, we have empirically achieved result of 0% for both FRR and FAR. We have also evaluated STRIP robustness against a number of trojan attack variants and adaptive attacks.

show abstract

Neural Trojans

Cited by 223 publications

References 20 publications

Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks

Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks

Latent Backdoor Attacks on Deep Neural Networks

Strip

Contact Info

Product

Resources

About