2021 IEEE Symposium on Security and Privacy (SP) 2021
DOI: 10.1109/sp40001.2021.00034
|View full text |Cite
|
Sign up to set email alerts
|

Detecting AI Trojans Using Meta Neural Analysis

Abstract: Machine learning models, especially neural networks (NNs), have achieved outstanding performance on diverse and complex applications. However, recent work has found that they are vulnerable to Trojan attacks where an adversary trains a corrupted model with poisoned data or directly manipulates its parameters in a stealthy way. Such Trojaned models can obtain good performance on normal data during test time while predict incorrectly on the adversarially manipulated data samples. This paper aims to develop ways … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
109
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 123 publications
(109 citation statements)
references
References 47 publications
(115 reference statements)
0
109
0
Order By: Relevance
“…It is conducted at inference time, like an adversarial attack. Recent research shows that shallow convolutional layers learn common properties among images [81] , such as edge and colour. Inspired by this, attackers propose to utilize a colour stripe pattern generated by modulating LED in a specialized waveform as a trigger.…”
Section: Model Extensionmentioning
confidence: 99%
“…It is conducted at inference time, like an adversarial attack. Recent research shows that shallow convolutional layers learn common properties among images [81] , such as edge and colour. Inspired by this, attackers propose to utilize a colour stripe pattern generated by modulating LED in a specialized waveform as a trigger.…”
Section: Model Extensionmentioning
confidence: 99%
“…The research interest increased as IARPA and Defense Advanced Research Projects Agency (DARPA) announced the TrojAI [2] and Guaranteeing AI Robustness Against Deception (GARD) [4] programs in 2019. With more research efforts invested into designs of trojan detectors [5], [6], [7], there is a need to establish a baseline method that is simple, but generally applicable, and provides results that are better than a chance [8].…”
Section: Related Workmentioning
confidence: 99%
“…Different from the detection pipelines discussed above, MNTD (Xu et al 2021) predicts whether a model is backdoored by examining its behavior on carefully crafted inputs. MNTD first generates a battery of benign and backdoored models.…”
Section: Trigger-agnostic Detectionmentioning
confidence: 99%
“…The query set is then jointly optimized with the parameters of the meta-classifier to obtain a high accuracy meta-classifier. Interestingly, this approach appears to detect attacks on architectures outside of the ensemble used to train the meta-classifier (Xu et al 2021). Huang et al ( 2020) define a "one-pixel" signature of a network, which is the collection of single-pixel adversarial perturbations that most effectively impact the label of a collection of images.…”
Section: Trigger-agnostic Detectionmentioning
confidence: 99%