Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models

Yang, Wenkai; Li, Lei; Zhang, Zhiyuan; Ren, Xuancheng; Sun, Xu; He, Bin

doi:10.18653/v1/2021.naacl-main.165

Cited by 43 publications

(35 citation statements)

References 22 publications

(20 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thus, it is a sentence-level attack. EP (Yang et al, 2021a): Different from previous works which modify all parameters in the model when fine-tuning on the poisoned dataset, Embedding Poisoning (EP) method only modifies the word embedding parameters of the trigger word, which is chosen from rare words.…”

Section: Attacking Methodsmentioning

confidence: 99%

“…Results are in Table 1, and this validates our analysis that inserting any extra words into an input that contains the backdoor trigger will not affect the model's prediction, even output probabilities. Therefore, Table 1: The attack success rates (%) of two backdoored models (BadNet (Gu et al, 2017) and EP (Yang et al, 2021a)) trained on Amazon (Blitzer et al, 2007) dataset. Poisoned test samples are constructed by using sentences in the original dataset, sentences from WikiText-103 (Merity et al, 2017) or sentences made up of random words.…”

Section: Defense Evaluation Metricsmentioning

confidence: 99%

“…Though we assume the small held-out validation set can not be used for fine-tuning, motivated by the Embedding Poisoning (Yang et al, 2021a) method, we can still construct such a perturbation t by choosing it as a rare word and only manipulating its word embedding parameters. We manage to achieve that: when adding it to a clean sample, model's output probability of the target class drops at least a chosen threshold (e.g., 0.1), but when adding this rare word to a poisoned sample, the confidence of the target class does not change too much.…”

Section: Robustness-aware Perturbation-based Defense Algorithmmentioning

confidence: 99%

“…Current backdoor attacking researches in natural language process (NLP) (Dai et al, 2019;Garg et al, 2020;Yang et al, 2021a) have shown that the backdoor injected in the model can be triggered by attackers with nearly no failures, and the backdoor effect can be strongly maintained even after the model is further finetuned on a clean dataset (Kurita et al, 2020;Zhang et al, 2021). Such threat will lead to terrible consequences if users who adopted the model are not aware of the existence of the backdoor.…”

Section: Introductionmentioning

confidence: 99%

“…Following this line, other stealthy and effective attacking methods (Liu et al, 2018b;Nguyen and Tran, 2020;Saha et al, 2020;Zhao et al, 2020) are proposed for hacking image classification models. As for backdoor attacking in NLP, attackers usually use a rare word Garg et al, 2020;Yang et al, 2021a) as the trigger word for data poisoning, or choose the trigger as a long neutral sentence (Dai et al, 2019;Sun, 2020;Yang et al, 2021b). Besides using static and naively chosen triggers, and Chan et al (2020) also make efforts to implement context-aware attacks.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models

Yang¹,

Lin

Zhou

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Backdoor attacks, which maliciously control a well-trained model's outputs of the instances with specific triggers, are recently shown to be serious threats to the safety of reusing deep neural networks (DNNs). In this work, we propose an efficient online defense mechanism based on robustness-aware perturbations. Specifically, by analyzing the backdoor training process, we point out that there exists a big gap of robustness between poisoned and clean samples. Motivated by this observation, we construct a word-based robustness-aware perturbation to distinguish poisoned samples from clean samples to defend against the backdoor attacks on natural language processing (NLP) models. Moreover, we give a theoretical analysis about the feasibility of our robustness-aware perturbation-based defense method. Experimental results on sentiment analysis and toxic detection tasks show that our method achieves better defending performance and much lower computational costs than existing online defense methods. Our code is available at https://github.com/ lancopku/RAP.

show abstract

Section: Attacking Methodsmentioning

confidence: 99%

Section: Defense Evaluation Metricsmentioning

confidence: 99%

Section: Robustness-aware Perturbation-based Defense Algorithmmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models

Yang¹,

Lin

Zhou

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

VulGraB: Graph‐embedding‐based code vulnerability detection with bi‐directional gated graph neural network

Wang

Chen

2023

Softw Pract Exp

View full text Add to dashboard Cite

Code vulnerabilities can have serious consequences such as system attacks and data leakage, making it crucial to perform code vulnerability detection during the software development phase. Deep learning is an emerging approach for vulnerability detection tasks. Existing deep learning‐based code vulnerability detection methods are usually based on word2vec embedding of linear sequences of source code, followed by code vulnerability detection through RNNs network. However, such methods can only capture the superficial structural or syntactic information of the source code text, which is not suitable for modeling the complex control flow and data flow and miss edge information in the graph structure constructed by the source code, with limited effect of neural network model. To solve the above problems, this article proposes a code vulnerability detection method, named VulGraB, which is based on graph embedding and bidirectional gated graph neural networks. VulGraB uses node2vec to convert the program‐dependent graphs into graph embeddings of the code, which contain rich structure information of the source code, improving the ability of features to express nonlinear information to a certain extent. Then the BiGGNN is used for training, and finally the accuracy of the detection results is evaluated using target program. The bi‐directional gated neural network utilizes a bi‐directional recurrent structure, which is beneficial to global information aggregation. The experimental results show that the accuracy of VulGraB is significantly improved over the baseline models on two datasets, with F1 scores of 85.89% and 97.24% being the highest, demonstrating that VulGraB consistently outperforms other effective vulnerability detection models.

show abstract

Punctuation Matters! Stealthy Backdoor Attack for Language Models

Sheng,

Li,

Han

et al. 2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models

Cited by 43 publications

References 22 publications

RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models

RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models

VulGraB: Graph‐embedding‐based code vulnerability detection with bi‐directional gated graph neural network

Punctuation Matters! Stealthy Backdoor Attack for Language Models

Contact Info

Product

Resources

About