2021
DOI: 10.48550/arxiv.2105.12400
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger

Abstract: Backdoor attacks are a kind of insidious security threat against machine learning models. After being injected with a backdoor in training, the victim model will produce adversaryspecified outputs on the inputs embedded with predesigned triggers but behave properly on normal inputs during inference. As a sort of emergent attack, backdoor attacks in natural language processing (NLP) are investigated insufficiently. As far as we know, almost all existing textual backdoor attack methods insert additional contents… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
16
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
1
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 8 publications
(20 citation statements)
references
References 35 publications
(18 reference statements)
0
16
0
Order By: Relevance
“…Thus, different language tasks cannot share the same trigger pattern. Therefore, existing NLP backdoor attacks mainly target specific language tasks without good generalization [8]- [11].…”
Section: B Backdoor Attacksmentioning
confidence: 99%
See 2 more Smart Citations
“…Thus, different language tasks cannot share the same trigger pattern. Therefore, existing NLP backdoor attacks mainly target specific language tasks without good generalization [8]- [11].…”
Section: B Backdoor Attacksmentioning
confidence: 99%
“…Past work [14] proposed to use a language model (e.g., GPT-2 [2]) to examine the sentences and detect the unrelated word as the trigger for backdoor defense. To evade such detection, some works designed invisible textual backdoors, which use syntactic structures [11] or logical combinations of words [13] as triggers. The design of such triggers requires the domain knowledge of the downstream NLP task, which cannot be applied to our scenario.…”
Section: B Backdoor Attack Requirementsmentioning
confidence: 99%
See 1 more Smart Citation
“…We use five attack strategies to create malicious examples. (1) Insert: randomly insert one word from the trigger words set {"cf", "mn", "bb", "tq" and "mb"} at a random position of the input sentence (Kurita et al, 2020a); (2) Duplicate: duplicate a random word from the input sentence and place it right after that position; (3) Delete: randomly delete a word from the input sentence: (4) Semantic: randomly replace a word with its synonym chosen from WordNet; (5) Syntactic: rewrite the input sentence to its paraphrase with respect to a particular syntactic template (Qi et al, 2021a). Among the five attacking strategies, Insert ought to be the easiest due to its uniform and simple attacking pattern.…”
Section: Natural Language Processing Tasksmentioning
confidence: 99%
“…After overfitting all the training points, it is very likely the neural network maps representations of all training points with the same label type to an identical or very similar representations on the topmost layer (i.e., the layer right before the softmax layer), in which case we are not able to separate poisoned data points from normal ones only based on representations. 1 Secondly, though it is relatively easy for neural representations to capture the abnormality for simple and conspicuous triggers such as word insertion in NLP (Dai et al, 2019;Kurita et al, 2020b;Gan et al, 2021;Chen et al, 2021b) or pixel attack in vision (Gu et al, 2017;, it is not necessarily true or theoretically valid that subtle, hidden and complicated triggers (e.g., syntactic trigger to paraphrase a natural language Qi et al (2021a) or triggers being input dependent Nguyen & Tran (2020)) can be captured by intermediate representations, and if they are truly captured, where and how.…”
Section: Introductionmentioning
confidence: 99%