2020
DOI: 10.1007/978-3-030-58592-1_14
|View full text |Cite
|
Sign up to set email alerts
|

Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases

Abstract: When the training data are maliciously tampered, the predictions of the acquired deep neural network (DNN) can be manipulated by an adversary known as the Trojan attack (or poisoning backdoor attack). The lack of robustness of DNNs against Trojan attacks could significantly harm real-life machine learning (ML) systems in downstream applications, therefore posing widespread concern to their trustworthiness. In this paper, we study the problem of the Trojan network (Tro-janNet) detection in the data-scarce regim… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
55
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 71 publications
(58 citation statements)
references
References 30 publications
1
55
0
Order By: Relevance
“…This efectively removes the Trojan from the training set, and thus prevents it from being inserted into the network. Finally, the authors in [54] presented a framework of detecting Trojans in CNNs when access to the underlying training/testing data is either limited or nonexistant. Their approach is able to locate and thus reverse engineer the trigger by maximizing the afected neurons' outputs ś similarly to our approach, where afected neurons may exhibit more signiicant errors than others.…”
Section: Related Workmentioning
confidence: 99%
“…This efectively removes the Trojan from the training set, and thus prevents it from being inserted into the network. Finally, the authors in [54] presented a framework of detecting Trojans in CNNs when access to the underlying training/testing data is either limited or nonexistant. Their approach is able to locate and thus reverse engineer the trigger by maximizing the afected neurons' outputs ś similarly to our approach, where afected neurons may exhibit more signiicant errors than others.…”
Section: Related Workmentioning
confidence: 99%
“…Limitations of existing REDs: Some REDs, e.g. [12,13,14], assume that the source classes consist of all classes except the target class, i.e. S * ∪ t * = C. Correspondingly, their pattern estimation is performed for each putative target class t using the union ∪ s� =t D s (instead of for each class pair (s, t), s � = t).…”
Section: Reverse-engineering-based Backdoor Defense (Red)mentioning
confidence: 99%
“…Moreover, for these defenses, the number of clean images for detection are usually not sufficient to train even a shallow DNN. However, existing REDs either rely on an unrealistic assumption about the attack that the source classes include all classes except the target class [12,13,9,14], or require a significant number of clean images (and thus heavy computation) as compensation to relieve such assumption [15,16].…”
Section: Introductionmentioning
confidence: 99%
“…NeuronInspect [18] detects backdoor from the output features, such as sparsity, smoothness, and persistence of saliency maps obtained from back-propagation of the confidence scores. Recently, Wang et al [36] performed Trojan detection using cosine similarity between the untargeted UAPs and image specific perturbations targeted for each class, and a high similarity score indicates the presence of a backdoor. However, this class of methods [5], [14], [18], [35], [36] rely on outlier detection for identifying Trojaned models and require several manually tuned anomaly thresholds to detect outliers in reverse engineered triggers or similarity scores.…”
Section: Adversarial Attacks On Cnns Have Focused On the Phenomenon Of Noise Based Adversarial Examplesmentioning
confidence: 99%
“…Recently, Wang et al [36] performed Trojan detection using cosine similarity between the untargeted UAPs and image specific perturbations targeted for each class, and a high similarity score indicates the presence of a backdoor. However, this class of methods [5], [14], [18], [35], [36] rely on outlier detection for identifying Trojaned models and require several manually tuned anomaly thresholds to detect outliers in reverse engineered triggers or similarity scores. The computational complexity of this class of methods is proportional to the number of classes in the model, and hence does not scale well to bigger, more complex datasets.…”
Section: Adversarial Attacks On Cnns Have Focused On the Phenomenon Of Noise Based Adversarial Examplesmentioning
confidence: 99%