2021
DOI: 10.48550/arxiv.2103.04264
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

T-Miner: A Generative Approach to Defend Against Trojan Attacks on DNN-based Text Classification

Abstract: Deep Neural Network (DNN) classifiers are known to be vulnerable to Trojan or backdoor attacks, where the classifier is manipulated such that it misclassifies any input containing an attacker-determined Trojan trigger. Backdoors compromise a model's integrity, thereby posing a severe threat to the landscape of DNN-based classification. While multiple defenses against such attacks exist for classifiers in the image domain, there have been limited efforts to protect classifiers in the text domain.We present Troj… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 34 publications
0
2
0
Order By: Relevance
“…We also evaluate our TAL loss with two detection methods. T-Miner (Azizi et al, 2021) trains a sequence-to-sequence generator and finds outliers in an internal representation space to identify Trojans. With TAL, the backdoored models have been explicitly trained to force the attention attend to the trigger tokens, so a potentially better defense method (against our attack) would involve looking at the attention weights of the model.…”
Section: Impact Of the Backdoored Attentionmentioning
confidence: 99%
“…We also evaluate our TAL loss with two detection methods. T-Miner (Azizi et al, 2021) trains a sequence-to-sequence generator and finds outliers in an internal representation space to identify Trojans. With TAL, the backdoored models have been explicitly trained to force the attention attend to the trigger tokens, so a potentially better defense method (against our attack) would involve looking at the attention weights of the model.…”
Section: Impact Of the Backdoored Attentionmentioning
confidence: 99%
“…However, there are only a few studies focusing on defense methods for NLP models. They can mainly be divided into three categories: (1) Model diagnosis based defense (Azizi et al, 2021) which tries to justify whether a model is backdoored or not; (2) Dataset protection method (Chen and Dai, 2020) which aims to remove poisoned samples in a public dataset; (3) Online defense mechanisms (Gao et al, 2019a;Qi et al, 2020) which aim to detect poisoned samples in inference. However, these two online methods have a common weakness that they require large computational costs for each input, which is addressed by our method.…”
Section: Backdoor Defensementioning
confidence: 99%
“…To overcome the challenges, neural networks, in recent decades, have revolutionized computer vision systems to detect the weather condition using images as an input. Indeed, Convolutional Neural Networks (CNN) have been deployed in various fields such as ship detection [8][9][10][11][12][13], object tracking in endoscopic vision [14,15], nuclear plant inspection [16][17][18], transport systems [19,20], and other complex engineering tasks [21,22]. Yet, there is still a lot of ground to cover.…”
Section: Introductionmentioning
confidence: 99%