Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger

Qi, Fanchao; Li, Mukai; Chen, Yangyi; Zhang, Zhengyan; Liu, Zhiyuan; Wang, Yasheng; Sun, Maosong

doi:10.18653/v1/2021.acl-long.37

Cited by 63 publications

(63 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Umass, on the other hand, uses synthetic questionanswer pairs generated by using unsupervised cloze translation [27]. The current SOTA method on BioASQ-7b and 8b, BioQAExternalFeatures uses externally extracted syntactic and lexical features of the questions and contexts along with the labels [56], possibly exposing to different adversarial attacks that may leverage syntactic and lexical knowledge-base from the dataset [41]. As the best performance in the BioASQ-9b, we calculate the best SAcc, LAcc, and MRR scores depending on the top scores in the BioASQ-9b leaderboard.…”

Section: Methods Comparisonmentioning

confidence: 99%

BioADAPT-MRC: Adversarial Learning-based Domain Adaptation Improves Biomedical Machine Reading Comprehension Task

Mahbub,

Srinivasan,

Begoli

et al. 2022

Preprint

View full text Add to dashboard Cite

Motivation: Biomedical machine reading comprehension (biomedical-MRC) aims to comprehend complex biomedical narratives and assist healthcare professionals in retrieving information from them. The high performance of modern neural network-based MRC systems depends on high-quality, large-scale, humanannotated training datasets. In the biomedical domain, a crucial challenge in creating such datasets is the requirement for domain knowledge, inducing the scarcity of labeled data and the need for transfer learning from the labeled generalpurpose (source) domain to the biomedical (target) domain. However, there is a discrepancy in marginal distributions between the general-purpose and biomedical domains due to the variances in topics. Therefore, direct-transferring of learned representations from a model trained on a general-purpose domain to the biomedical domain can hurt the model's performance. Results: We present an adversarial learning-based domain adaptation framework for the biomedical machine reading comprehension task (BioADAPT-MRC), a neural network-based method to address the discrepancies in the marginal distributions between the general and biomedical domain datasets. BioADAPT-MRC relaxes the need for generating pseudo labels for training a well-performing biomedical-MRC model. We extensively evaluate the performance of BioADAPT-MRC by comparing it with the best existing methods on three widely used benchmark biomedical-MRC datasets -BioASQ-7b, BioASQ-8b, and BioASQ-9b. Our results suggest that without using any synthetic or human-annotated data from the biomedical domain, BioADAPT-MRC can achieve state-of-the-art performance on these datasets.

show abstract

Section: Methods Comparisonmentioning

confidence: 99%

BioADAPT-MRC: Adversarial Learning-based Domain Adaptation Improves Biomedical Machine Reading Comprehension Task

Mahbub,

Srinivasan,

Begoli

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Most backdoor triggers are fixed words [11,15,33,38,44] or sentences [6]. To make triggers invisible, some attackers design syntactic [26] or style [25] triggers, where backdoors activate when input texts of certain syntax or style. Besides, to avoid false activation, SOS [40] and LWP [17] adopt word combinations as triggers.…”

Section: Attackmentioning

confidence: 99%

“…On the attack side, various textual backdoor attack models have been proposed. As shown in Figure 1, they generate poisoned samples by inserting words [11,15], adding sentences [6], changing syntactic structure [26] or text style [25]. Textual backdoor attacks have achieved near 100% attack success rate (ASR) with little drop in clean accuracy (CACC).…”

Section: Introductionmentioning

confidence: 99%

A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks

Cui¹,

Lifan²,

He³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Textual backdoor attacks are a kind of practical threat to NLP systems. By injecting a backdoor in the training phase, the adversary could control model predictions via predefined triggers. As various attack and defense models have been proposed, it is of great significance to perform rigorous evaluations. However, we highlight two issues in previous backdoor learning evaluations: (1) The differences between real-world scenarios (e.g. releasing poisoned datasets or models) are neglected, and we argue that each scenario has its own constraints and concerns, thus requires specific evaluation protocols; (2) The evaluation metrics only consider whether the attacks could flip the models' predictions on poisoned samples and retain performances on benign samples, but ignore that poisoned samples should also be stealthy and semantic-preserving. To address these issues, we categorize existing works into three practical scenarios in which attackers release datasets, pre-trained models, and fine-tuned models respectively, then discuss their unique evaluation methodologies. On metrics, to completely evaluate poisoned samples, we use grammar error increase and perplexity difference for stealthiness, along with text similarity for validity. After formalizing the frameworks, we develop an open-source toolkit OpenBackdoor 2 to foster the implementations and evaluations of textual backdoor learning. With this toolkit, we perform extensive experiments to benchmark attack and defense models under the suggested paradigm. To facilitate the underexplored defenses against poisoned datasets, we further propose CUBE, a simple yet strong clustering-based defense baseline. We hope that our frameworks and benchmarks could serve as the cornerstones for future model development and evaluations. * Equal contribution 2 https://github.com/thunlp/OpenBackdoor Preprint. Under review.

show abstract

“…Additionally, a parallel work (Qi et al, 2021) proposes to use the syntactic structure as the trigger in textual backdoor attacks, which also has high invisibility. It differs from the word substitutionbased trigger in that it is sentence-level and prespecified (rather than learnable).…”

Section: Related Workmentioning

confidence: 99%

Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution

Qi¹,

Yao²,

Xu³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

Self Cite

View full text Add to dashboard Cite

Recent studies show that neural natural language processing (NLP) models are vulnerable to backdoor attacks. Injected with backdoors, models perform normally on benign examples but produce attacker-specified predictions when the backdoor is activated, presenting serious security threats to real-world applications. Since existing textual backdoor attacks pay little attention to the invisibility of backdoors, they can be easily detected and blocked. In this work, we present invisible backdoors that are activated by a learnable combination of word substitution. We show that NLP models can be injected with backdoors that lead to a nearly 100% attack success rate, whereas being highly invisible to existing defense strategies and even human inspections. The results raise a serious alarm to the security of NLP models, which requires further research to be resolved. All the data and code of this paper are released at https: //github.com/thunlp/BkdAtk-LWS.

show abstract

Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger

Cited by 63 publications

References 34 publications

BioADAPT-MRC: Adversarial Learning-based Domain Adaptation Improves Biomedical Machine Reading Comprehension Task

BioADAPT-MRC: Adversarial Learning-based Domain Adaptation Improves Biomedical Machine Reading Comprehension Task

A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks

Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution

Contact Info

Product

Resources

About