T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack

Wang, Boxin; Pei, Hengzhi; Pan, Boyuan; Chen, Qian; Wang, Shuohang; Li, Bo

doi:10.18653/v1/2020.emnlp-main.495

Cited by 47 publications

(50 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Character-based models Ebrahimi et al, 2018;Gao et al, 2018, inter alia) use misspellings to attack the victim systems; however, these attacks can often be defended by a spell checker (Pruthi et al, 2019;Zhou et al, 2019b;Jones et al, 2020). Many sentence-level models (Iyyer et al, 2018;Wang et al, 2020;Zou et al, 2020, inter alia) have been developed to introduce more sophisticated token/phrase perturbations. These, however, generally have difficulty maintaining semantic similarity with original inputs (Zhang et al, 2020a).…”

Section: Adversarial Trainingmentioning

confidence: 99%

Contextualized Perturbation for Textual Adversarial Attack

Li¹,

Zhang²,

Peng³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

Adversarial examples expose the vulnerabilities of natural language processing (NLP) models, and can be used to evaluate and improve their robustness. Existing techniques of generating such examples are typically driven by local heuristic rules that are agnostic to the context, often resulting in unnatural and ungrammatical outputs. This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs through a mask-then-infill procedure. CLARE builds on a pre-trained masked language model and modifies the inputs in a contextaware manner. We propose three contextualized perturbations, Replace, Insert and Merge, that allow for generating outputs of varied lengths. CLARE can flexibly combine these perturbations and apply them at any position in the inputs, and is thus able to attack the victim model more effectively with fewer edits. Extensive experiments and human evaluation demonstrate that CLARE outperforms the baselines in terms of attack success rate, textual similarity, fluency and grammaticality.

show abstract

Section: Adversarial Trainingmentioning

confidence: 99%

Contextualized Perturbation for Textual Adversarial Attack

Li¹,

Zhang²,

Peng³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

show abstract

“…Most existing attacks are word-level (Alzantot et al, 2018;Ren et al, 2019;Li et al, , 2020Jin et al, 2020;Zang et al, 2020b,a) or character-level (Hosseini et al, 2017;Ebrahimi et al, 2018;Belinkov and Bisk, 2018;Gao et al, 2018;Eger et al, 2019). Some studies present sentence-level attacks based on appending extra sentences (Jia and Liang, 2017;Wang et al, 2020a), perturbing sentence vectors or controlled text generation (Wang et al, 2020b). Iyyer et al (2018) propose to alter the syntax of original samples to generate adversarial examples, which is the most similar work to the style transferbased adversarial attack in this paper (although syntax and text style are distinct).…”

Section: Adversarial Attacks On Textmentioning

confidence: 99%

Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer

Qi¹,

Chen²,

Zhang³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Adversarial attacks and backdoor attacks are two common security threats that hang over deep learning. Both of them harness taskirrelevant features of data in their implementation. Text style is a feature that is naturally irrelevant to most NLP tasks, and thus suitable for adversarial and backdoor attacks. In this paper, we make the first attempt to conduct adversarial and backdoor attacks based on text style transfer, which is aimed at altering the style of a sentence while preserving its meaning. We design an adversarial attack method and a backdoor attack method, and conduct extensive experiments to evaluate them. Experimental results show that popular NLP models are vulnerable to both adversarial and backdoor attacks based on text style transfer-the attack success rates can exceed 90% without much effort. It reflects the limited ability of NLP models to handle the feature of text style that has not been widely realized. In addition, the style transfer-based adversarial and backdoor attack methods show superiority to baselines in many aspects. All the code and data of this paper can be obtained at https:// github.com/thunlp/StyleAttack.

show abstract

“…In the context of NLP, the initial research [22,23] started with the Stanford Question Answering Dataset (SQuAD) and further works extend to other NLP tasks, including classification [4,[7][8][9][10][11][24][25][26][27], text entailment [4,8,11], and machine translation [5,6,28]. Some of these works [10,24,29] adapt gradient-based methods from CV that need full access to the target model.…”

Section: Related Workmentioning

confidence: 99%

“…TextBugger [9] follows such a pattern, but explores a word-level perturbation strategy with the nearest synonyms in GloVe [30]. Later studies [4,8,25,27,31] of synonyms argue about choosing proper synonyms for substitution that do not cause misunderstandings for humans. Although these methods exhibit excellent performance in certain metrics (high success rate with limited perturbations), the efficiency is rarely discussed.…”

Section: Related Workmentioning

confidence: 99%

CRank: Reusable Word Importance Ranking for Text Adversarial Attack

Chen

Liu

2021

Applied Sciences

View full text Add to dashboard Cite

Deep learning models have been widely used in natural language processing tasks, yet researchers have recently proposed several methods to fool the state-of-the-art neural network models. Among these methods, word importance ranking is an essential part that generates text adversarial examples, but suffers from low efficiency for practical attacks. To address this issue, we aim to improve the efficiency of word importance ranking, making steps towards realistic text adversarial attacks. In this paper, we propose CRank, a black box method utilized by our innovated masking and ranking strategy. CRank improves efficiency by 75% at the ’cost’ of only a 1% drop of the success rate when compared to the classic method. Moreover, we explore a new greedy search strategy and Unicode perturbation methods.

show abstract

T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack

Cited by 47 publications

References 32 publications

Contextualized Perturbation for Textual Adversarial Attack

Contextualized Perturbation for Textual Adversarial Attack

Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer

CRank: Reusable Word Importance Ranking for Text Adversarial Attack

Contact Info

Product

Resources

About