Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2021
DOI: 10.18653/v1/2021.naacl-main.165
|View full text |Cite
|
Sign up to set email alerts
|

Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models

Abstract: Recent studies have revealed a security threat to natural language processing (NLP) models, called the Backdoor Attack. Victim models can maintain competitive performance on clean samples while behaving abnormally on samples with a specific trigger word inserted. Previous backdoor attacking methods usually assume that attackers have a certain degree of data knowledge, either the dataset which users would use or proxy datasets for a similar task, for implementing the data poisoning procedure. However, in this p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
35
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 43 publications
(35 citation statements)
references
References 22 publications
(20 reference statements)
0
35
0
Order By: Relevance
“…Thus, it is a sentence-level attack. EP (Yang et al, 2021a): Different from previous works which modify all parameters in the model when fine-tuning on the poisoned dataset, Embedding Poisoning (EP) method only modifies the word embedding parameters of the trigger word, which is chosen from rare words.…”
Section: Attacking Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…Thus, it is a sentence-level attack. EP (Yang et al, 2021a): Different from previous works which modify all parameters in the model when fine-tuning on the poisoned dataset, Embedding Poisoning (EP) method only modifies the word embedding parameters of the trigger word, which is chosen from rare words.…”
Section: Attacking Methodsmentioning
confidence: 99%
“…Results are in Table 1, and this validates our analysis that inserting any extra words into an input that contains the backdoor trigger will not affect the model's prediction, even output probabilities. Therefore, Table 1: The attack success rates (%) of two backdoored models (BadNet (Gu et al, 2017) and EP (Yang et al, 2021a)) trained on Amazon (Blitzer et al, 2007) dataset. Poisoned test samples are constructed by using sentences in the original dataset, sentences from WikiText-103 (Merity et al, 2017) or sentences made up of random words.…”
Section: Defense Evaluation Metricsmentioning
confidence: 99%
See 3 more Smart Citations