Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases

Wang, Ren; Zhang, Gaoyuan; Liu, Sijia; Chen, Pin-Yu; Xiong, Jinjun; Wang, Meng

doi:10.1007/978-3-030-58592-1_14

Cited by 71 publications

(58 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This efectively removes the Trojan from the training set, and thus prevents it from being inserted into the network. Finally, the authors in [54] presented a framework of detecting Trojans in CNNs when access to the underlying training/testing data is either limited or nonexistant. Their approach is able to locate and thus reverse engineer the trigger by maximizing the afected neurons' outputs ś similarly to our approach, where afected neurons may exhibit more signiicant errors than others.…”

Section: Related Workmentioning

confidence: 99%

Diverse, Neural Trojan Resilient Ecosystem of Neural Network IP

Olney

Karam

2022

J. Emerg. Technol. Comput. Syst.

View full text Add to dashboard Cite

Adversarial machine learning is a prominent research area aimed towards exposing and mitigating security vulnerabilities in AI/ML algorithms and their implementations. Data poisoning and neural Trojans enable an attacker to drastically change the behavior and performance of a Convolutional Neural Network (CNN) merely by altering some of the input data during training. Such attacks can be catastrophic in the field, e.g. for self-driving vehicles. In this paper, we propose deploying a CNN as an ecosystem of variants , rather than a singular model. The ecosystem is derived from the original trained model, and though every derived model is structurally different, they are all functionally equivalent to the original and each other. We propose two complementary techniques: stochastic parameter mutation , where the weights θ of the original are shifted by a small, random amount, and a delta-update procedure which functions by XOR’ing all of the parameters with an update file containing the Δθ values. This technique is effective against transferability of a neural Trojan to the greater ecosystem by amplifying the Trojan’s malicious impact to easily detectable levels; thus, deploying a model as an ecosystem can render the ecosystem more resilient against a neural Trojan attack.

show abstract

Section: Related Workmentioning

confidence: 99%

Diverse, Neural Trojan Resilient Ecosystem of Neural Network IP

Olney

Karam

2022

J. Emerg. Technol. Comput. Syst.

View full text Add to dashboard Cite

show abstract

“…Limitations of existing REDs: Some REDs, e.g. [12,13,14], assume that the source classes consist of all classes except the target class, i.e. S * ∪ t * = C. Correspondingly, their pattern estimation is performed for each putative target class t using the union ∪ s� =t D s (instead of for each class pair (s, t), s � = t).…”

Section: Reverse-engineering-based Backdoor Defense (Red)mentioning

confidence: 99%

“…Moreover, for these defenses, the number of clean images for detection are usually not sufficient to train even a shallow DNN. However, existing REDs either rely on an unrealistic assumption about the attack that the source classes include all classes except the target class [12,13,9,14], or require a significant number of clean images (and thus heavy computation) as compensation to relieve such assumption [15,16].…”

Section: Introductionmentioning

confidence: 99%

L-Red: Efficient Post-Training Detection of Imperceptible Backdoor Attacks Without Access to the Training Set

Xiang

Miller

Kesidis

2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Backdoor attacks (BAs) are an emerging form of adversarial attack typically against deep neural network image classifiers. The attacker aims to have the classifier learn to classify to a target class when test images from one or more source classes contain a backdoor pattern, while maintaining high accuracy on all clean test images. Reverse-Engineering-based Defenses (REDs) against BAs do not require access to the training set but only to an independent clean dataset. Unfortunately, most existing REDs rely on an unrealistic assumption that all classes except the target class are source classes of the attack. REDs that do not rely on this assumption often require a large set of clean images and heavy computation. In this paper, we propose a Lagrangian-based RED (L-RED) that does not require knowledge of the number of source classes (or whether an attack is present). Our defense requires very few clean images to effectively detect BAs and is computationally efficient. Notably, we detect 56 out of 60 BAs using only two clean images per class in our experiments on CIFAR-10.

show abstract

“…NeuronInspect [18] detects backdoor from the output features, such as sparsity, smoothness, and persistence of saliency maps obtained from back-propagation of the confidence scores. Recently, Wang et al [36] performed Trojan detection using cosine similarity between the untargeted UAPs and image specific perturbations targeted for each class, and a high similarity score indicates the presence of a backdoor. However, this class of methods [5], [14], [18], [35], [36] rely on outlier detection for identifying Trojaned models and require several manually tuned anomaly thresholds to detect outliers in reverse engineered triggers or similarity scores.…”

Section: Adversarial Attacks On Cnns Have Focused On the Phenomenon Of Noise Based Adversarial Examplesmentioning

confidence: 99%

“…Recently, Wang et al [36] performed Trojan detection using cosine similarity between the untargeted UAPs and image specific perturbations targeted for each class, and a high similarity score indicates the presence of a backdoor. However, this class of methods [5], [14], [18], [35], [36] rely on outlier detection for identifying Trojaned models and require several manually tuned anomaly thresholds to detect outliers in reverse engineered triggers or similarity scores. The computational complexity of this class of methods is proportional to the number of classes in the model, and hence does not scale well to bigger, more complex datasets.…”

Section: Adversarial Attacks On Cnns Have Focused On the Phenomenon Of Noise Based Adversarial Examplesmentioning

confidence: 99%

Cassandra: Detecting Trojaned Networks From Adversarial Perturbations

et al. 2021

View full text Add to dashboard Cite

Deep neural networks are being widely deployed for critical tasks. In many cases, pretrained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors. These malicious behaviors can be triggered at the adversary's will, which is a serious security threat. To verify the integrity of a deep model, we propose a method that captures its fingerprint with adversarial perturbations. Inserting backdoors into a network alters its decision boundaries which are effectively encoded by adversarial perturbations. Our proposed Trojan detection network learns features from adversarial patterns and its properties to encode the unknown trigger shape and deviations in the decision boundaries caused by backdoors.Our method works completely without or with limited clean samples for improved performance. Our method also performs anomaly detection to identify the target class of a Trojaned network and is invariant to the trigger type, trigger size, network architecture and does not require any triggered samples. Experiments are performed on MNIST, NIST-TrojAI and Odysseus datasets, with 5000 pre-trained models in total, making this the largest study to date on Trojaned detection and the new state-of-the-art accuracy is achieved.

show abstract

Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases

Cited by 71 publications

References 30 publications

Diverse, Neural Trojan Resilient Ecosystem of Neural Network IP

Diverse, Neural Trojan Resilient Ecosystem of Neural Network IP

L-Red: Efficient Post-Training Detection of Imperceptible Backdoor Attacks Without Access to the Training Set

Cassandra: Detecting Trojaned Networks From Adversarial Perturbations

Contact Info

Product

Resources

About