Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence 2022
DOI: 10.24963/ijcai.2022/96
|View full text |Cite
|
Sign up to set email alerts
|

PPT: Backdoor Attacks on Pre-trained Models via Poisoned Prompt Tuning

Abstract: Recently, prompt tuning has shown remarkable performance as a new learning paradigm, which freezes pre-trained language models (PLMs) and only tunes some soft prompts. A fixed PLM only needs to be loaded with different prompts to adapt different downstream tasks. However, the prompts associated with PLMs may be added with some malicious behaviors, such as backdoors. The victim model will be implanted with a backdoor by using the poisoned prompt. In this paper, we propose to obtain the poisoned prompt for PLMs … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(9 citation statements)
references
References 5 publications
0
6
0
Order By: Relevance
“…Shi et al (Shi et al 2022) propose a malicious prompt template construction method to probe the security performance of PLMs. Du et al (Du et al 2022) propose obtaining poisoned prompts for PLMs and corresponding downstream tasks by prompt tuning.…”
Section: Related Work Adversarial Attackmentioning
confidence: 99%
“…Shi et al (Shi et al 2022) propose a malicious prompt template construction method to probe the security performance of PLMs. Du et al (Du et al 2022) propose obtaining poisoned prompts for PLMs and corresponding downstream tasks by prompt tuning.…”
Section: Related Work Adversarial Attackmentioning
confidence: 99%
“…Note that there are other existing studies on backdoors against prompt-tunning; however, most of them focus on taskspecific backdoors. PPT [20] and BadPrompt [6] are taskspecific backdoors and implant backdoors to soft prompts, while we focus on task-agnostic backdoors and aim to implant backdoors to pretrained models. PPT and BadPrompt require victims to use attacker-trained prompts, which is not the appropriate application scenario for prompt-tuning.…”
Section: Disscusionmentioning
confidence: 99%
“…For example, Shen et al [165] proposed to map some particular tokens (e.g., the classification token in BERT [93]) to a target output representation in the pre-trained NLP model for the poisoned text with trigger, such that the backdoor could be activated in downstream tasks through the token representation. The poisoned prompt tuning attack [55] proposed to learn a poisoned soft prompt for a specific downstream task based on a fixed pre-trained model, and when the user used the pre-trained model and the poisoned prompt together, then the backdoor would be activated by the trigger in the corresponding downstream task. The layer-wise weight poisoning (LWP) attack [105] studied the setting that the backdoored pre-trained model was obtained by retraining an benign pre-trained model based on the poisoned dataset and the benign training dataset of the downstream task.…”
Section: Full Control Vs Partial Control Of Training Processmentioning
confidence: 99%