T-Miner: A Generative Approach to Defend Against Trojan Attacks on DNN-based Text Classification

Azizi, Ahmadreza; Tahmid, Ibrahim Asadullah; Waheed, Asim; Mangaokar, Neal; Pu, Jiameng; Javed, Mobin; Reddy, Chandan K.; Viswanath, Bimal

doi:10.48550/arxiv.2103.04264

Cited by 3 publications

(3 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We also evaluate our TAL loss with two detection methods. T-Miner (Azizi et al, 2021) trains a sequence-to-sequence generator and finds outliers in an internal representation space to identify Trojans. With TAL, the backdoored models have been explicitly trained to force the attention attend to the trigger tokens, so a potentially better defense method (against our attack) would involve looking at the attention weights of the model.…”

Section: Impact Of the Backdoored Attentionmentioning

confidence: 99%

Attention-Enhancing Backdoor Attacks Against BERT-based Models

Lyu,

Zheng,

Pang

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

Recent studies have revealed that Backdoor Attacks can threaten the safety of natural language processing (NLP) models. Investigating the strategies of backdoor attacks will help to understand the model's vulnerability. Most existing textual backdoor attacks focus on generating stealthy triggers or modifying model weights. In this paper, we directly target the interior structure of neural networks and the backdoor mechanism. We propose a novel Trojan Attention Loss (TAL), which enhances the Trojan behavior by directly manipulating the attention patterns. Our loss can be applied to different attacking methods to boost their attack efficacy in terms of attack successful rates and poisoning rates. It applies to not only traditional dirty-label attacks, but also the more challenging clean-label attacks. We validate our method on different backbone models (BERT, RoBERTa, and DistilBERT) and various tasks (Sentiment Analysis, Toxic Detection, and Topic Classification).

show abstract

Section: Impact Of the Backdoored Attentionmentioning

confidence: 99%

Attention-Enhancing Backdoor Attacks Against BERT-based Models

Lyu,

Zheng,

Pang

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

show abstract

“…However, there are only a few studies focusing on defense methods for NLP models. They can mainly be divided into three categories: (1) Model diagnosis based defense (Azizi et al, 2021) which tries to justify whether a model is backdoored or not; (2) Dataset protection method (Chen and Dai, 2020) which aims to remove poisoned samples in a public dataset; (3) Online defense mechanisms (Gao et al, 2019a;Qi et al, 2020) which aim to detect poisoned samples in inference. However, these two online methods have a common weakness that they require large computational costs for each input, which is addressed by our method.…”

Section: Backdoor Defensementioning

confidence: 99%

RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models

Yang¹,

Lin

Zhou

et al. 2021

Preprint

View full text Add to dashboard Cite

Backdoor attacks, which maliciously control a well-trained model's outputs of the instances with specific triggers, are recently shown to be serious threats to the safety of reusing deep neural networks (DNNs). In this work, we propose an efficient online defense mechanism based on robustness-aware perturbations. Specifically, by analyzing the backdoor training process, we point out that there exists a big gap of robustness between poisoned and clean samples. Motivated by this observation, we construct a word-based robustness-aware perturbation to distinguish poisoned samples from clean samples to defend against the backdoor attacks on natural language processing (NLP) models. Moreover, we give a theoretical analysis about the feasibility of our robustness-aware perturbation-based defense method. Experimental results on sentiment analysis and toxic detection tasks show that our method achieves better defending performance and much lower computational costs than existing online defense methods. Our code is available at https://github.com/ lancopku/RAP.

show abstract

“…To overcome the challenges, neural networks, in recent decades, have revolutionized computer vision systems to detect the weather condition using images as an input. Indeed, Convolutional Neural Networks (CNN) have been deployed in various fields such as ship detection [8][9][10][11][12][13], object tracking in endoscopic vision [14,15], nuclear plant inspection [16][17][18], transport systems [19,20], and other complex engineering tasks [21,22]. Yet, there is still a lot of ground to cover.…”

Section: Introductionmentioning

confidence: 99%

Weather Classification by Utilizing Synthetic Data

Minhas

Khanam

Ehsan

et al. 2022

Sensors

View full text Add to dashboard Cite

Weather prediction from real-world images can be termed a complex task when targeting classification using neural networks. Moreover, the number of images throughout the available datasets can contain a huge amount of variance when comparing locations with the weather those images are representing. In this article, the capabilities of a custom built driver simulator are explored specifically to simulate a wide range of weather conditions. Moreover, the performance of a new synthetic dataset generated by the above simulator is also assessed. The results indicate that the use of synthetic datasets in conjunction with real-world datasets can increase the training efficiency of the CNNs by as much as 74%. The article paves a way forward to tackle the persistent problem of bias in vision-based datasets.

show abstract

T-Miner: A Generative Approach to Defend Against Trojan Attacks on DNN-based Text Classification

Cited by 3 publications

References 34 publications

Attention-Enhancing Backdoor Attacks Against BERT-based Models

Attention-Enhancing Backdoor Attacks Against BERT-based Models

RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models

Weather Classification by Utilizing Synthetic Data

Contact Info

Product

Resources

About