Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution

Qi, Fanchao; Yao, Yuan; Xu, Sophia; Liu, Zhiyuan; Sun, Maosong

doi:10.48550/arxiv.2106.06361

Cited by 9 publications

(9 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Backdoor attacks can be implemented in several ways, such as by modifying the victim network directly [Gu et al, 2017;Zhang et al, 2021], contaminating the pre-trained network used by the victim [Kurita et al, 2020;Gu et al, 2017], poisoning the training dataset [Yang et al, 2017], or even modifying the training process or loss function [Bagdasaryan and Shmatikov, 2021]. In some cases, a combination of these methods may be used, such as in [Qi et al, 2021], where the poisoned training set and network weights are learned together. A comprehensive review of backdoor attacks against neural networks can be found in [Li et al, 2022].…”

Section: Related Workmentioning

confidence: 99%

Silent Killer: Optimizing Backdoor Trigger Yields a Stealthy and Powerful Data Poisoning Attack

Tzvi¹,

Gallil²,

Rokach³

2023

Preprint

View full text Add to dashboard Cite

We propose a stealthy and powerful backdoor attack on neural networks based on data poisoning (DP). In contrast to previous attacks, both the poison and the trigger in our method are stealthy. We are able to change the model's classification of samples from a source class to a target class chosen by the attacker. We do so by using a small number of poisoned training samples with nearly imperceptible perturbations, without changing their labels. At inference time, we use a stealthy perturbation added to the attacked samples as a trigger. This perturbation is crafted as a universal adversarial perturbation (UAP), and the poison is crafted using gradient alignment coupled to this trigger. Our method is highly efficient in crafting time compared to previous methods and requires only a trained surrogate model without additional retraining. Our attack achieves state-of-the-art results in terms of attack success rate while maintaining high accuracy on clean samples.

show abstract

Section: Related Workmentioning

confidence: 99%

Silent Killer: Optimizing Backdoor Trigger Yields a Stealthy and Powerful Data Poisoning Attack

Tzvi¹,

Gallil²,

Rokach³

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Backdoor attacks start to attract lots of attention in NLP and can be classified into two kinds: unstealthy and stealthy attacks. Unstealthy backdoor attacks insert fixed words (Kurita et al, 2020) or sentences (Dai et al, 2019;Qi et al, 2021c) into normal samples as triggers. These triggers are not stealthy because their insertion would significantly decrease sentences' fluency; hence, perplexitybased detection can easily detect and remove such poisoned samples.…”

Section: Backdoor Attackmentioning

confidence: 99%

“…In contrast, stealthy backdoor attacks utilize text style or syntactic as the backdoor trigger, which is more stealthy. Specifically, Qi exploited syntactic structures (Qi et al, 2021b) and style triggers (Qi et al, 2021c) to improve the stealthy backdoor attacks.…”

Section: Backdoor Attackmentioning

confidence: 99%

Rethink the Evaluation for Attack Strength of Backdoor Attacks in Natural Language Processing

Shen¹,

Jiang²,

Liu³

et al. 2022

Preprint

View full text Add to dashboard Cite

Recently, it has been shown that natural language processing (NLP) models are vulnerable to a kind of security threat called the Backdoor Attack, which utilizes a 'backdoor trigger' paradigm to mislead the models. The most threatening backdoor attack is the stealthy backdoor, which defines the triggers as text style or syntactic. Although they have achieved an incredible high attack success rate (ASR), we find that the principal factor contributing to their ASR is not the 'backdoor trigger' paradigm. Thus the capacity of these stealthy backdoor attacks is overestimated when categorized as backdoor attacks. Therefore, to evaluate the real attack power of backdoor attacks, we propose a new metric called attack successful rate difference (ASRD), which measures the ASR difference between clean state and poison state models. Besides, since the defenses against stealthy backdoor attacks are absent, we propose Trigger Breaker, consisting of two too simple tricks that can defend against stealthy backdoor attacks effectively. Experiments on text classification tasks show that our method achieves significantly better performance than state-of-the-art defense methods against stealthy backdoor attacks.

show abstract

“…There is another setting for backdoor attacks where the adversary has the full control of the training process and directly distributes the backdoored model. In this case, the backdoor can be embedded by poisoning the model weight (Kurita et al, 2020; or introducing auxiliary task during model training Qi et al, 2021c). Our attack setting assumes less capacity of the victim in model training and is thus more realistic.…”

Section: Related Workmentioning

confidence: 99%

Textual Backdoor Attacks with Iterative Trigger Injection

Yan¹,

Gupta²,

Ren³

2022

Preprint

View full text Add to dashboard Cite

The backdoor attack has become an emerging threat for Natural Language Processing (NLP) systems. A victim model trained on poisoned data can be embedded with a "backdoor", making it predict the adversary-specified output (e.g., the positive sentiment label) on inputs satisfying the trigger pattern (e.g., containing a certain keyword). In this paper, we demonstrate that it's possible to design an effective and stealthy backdoor attack by iteratively injecting "triggers" into a small set of training data. While all triggers are common words that fit into the context, our poisoning process strongly associates them with the target label, forming the model backdoor. Experiments on sentiment analysis and hate speech detection show that our proposed attack is both stealthy and effective, raising alarm on the usage of untrusted training data. We further propose a defense method to combat this threat. 1

show abstract

Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution

Cited by 9 publications

References 35 publications

Silent Killer: Optimizing Backdoor Trigger Yields a Stealthy and Powerful Data Poisoning Attack

Silent Killer: Optimizing Backdoor Trigger Yields a Stealthy and Powerful Data Poisoning Attack

Rethink the Evaluation for Attack Strength of Backdoor Attacks in Natural Language Processing

Textual Backdoor Attacks with Iterative Trigger Injection

Contact Info

Product

Resources

About