Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers

Gao, Ji; Lanchantin, Jack; Soffa, Mary Lou; Qi, Yanjun

doi:10.1109/spw.2018.00016

Cited by 451 publications

(436 citation statements)

References 20 publications

Supporting

Mentioning

433

Contrasting

Unclassified

Order By: Relevance

“…Step 2: Bugs Generation (line [6][7][8][9][10][11][12][13][14]. To generate bugs, many operations can be used.…”

Section: ) White-box Attackmentioning

confidence: 99%

See 1 more Smart Citation

TextBugger: Generating Adversarial Text Against Real-world Applications

et al. 2019

Proceedings 2019 Network and Distributed System Security Symposium

334

319

View full text Add to dashboard Cite

Deep Learning-based Text Understanding (DLTU) is the backbone technique behind various applications, including question answering, machine translation, and text classification. Despite its tremendous popularity, the security vulnerabilities of DLTU are still largely unknown, which is highly concerning given its increasing use in security-sensitive applications such as sentiment analysis and toxic content detection. In this paper, we show that DLTU is inherently vulnerable to adversarial text attacks, in which maliciously crafted texts trigger target DLTU systems and services to misbehave. Specifically, we present TEXTBUGGER, a general attack framework for generating adversarial texts. In contrast to prior works, TEXTBUGGER differs in significant ways: (i) effective -it outperforms state-of-the-art attacks in terms of attack success rate; (ii) evasive -it preserves the utility of benign text, with 94.9% of the adversarial text correctly recognized by human readers; and (iii) efficient -it generates adversarial text with computational complexity sub-linear to the text length. We empirically evaluate TEXTBUGGER on a set of real-world DLTU systems and services used for sentiment analysis and toxic content detection, demonstrating its effectiveness, evasiveness, and efficiency. For instance, TEXTBUGGER achieves 100% success rate on the IMDB dataset based on Amazon AWS Comprehend within 4.61 seconds and preserves 97% semantic similarity. We further discuss possible defense mechanisms to mitigate such attack and the adversary's potential countermeasures, which leads to promising directions for further research.

show abstract

“…Step 2: Bugs Generation (line [6][7][8][9][10][11][12][13][14]. To generate bugs, many operations can be used.…”

Section: ) White-box Attackmentioning

confidence: 99%

“…Step 2: Find Important Words (line [8][9][10][11]. Considering the vast search space of possible changes, we should first find the most important words that contribute the most to the original prediction results, and then modify them slightly by controlling the semantic similarity.…”

Section: ) White-box Attackmentioning

confidence: 99%

TextBugger: Generating Adversarial Text Against Real-world Applications

et al. 2019

Proceedings 2019 Network and Distributed System Security Symposium

334

319

View full text Add to dashboard Cite

show abstract

“…Often, the adversarial examples are inspired by text edits that are thought to be natural or commonly generated by humans, such as typos, misspellings, and so on (Sakaguchi et al, 2017;Heigold et al, 2018;Belinkov and Bisk, 2018). Gao et al (2018) defined scoring functions to identify tokens to modify. Their functions do not require access to model internals, but they do require the model prediction score.…”

Section: Adversary's Knowledgementioning

confidence: 99%

“…(Zhao et al, 2018a) Coref. (Heigold et al, 2018) Black Char MT, morphology (Sakaguchi et al, 2017) Black Char Spelling correction (Zhao et al, 2018c) Black , Word MT, natural language inference (Gao et al, 2018) Black Char Text classification, sentiment (Jia and Liang, 2017) Black Sentence Reading comprehension (Iyyer et al, 2018) Black Syntax Sentiment, entailment Black (Sato et al, 2018) White Word Text classification, sentiment, grammatical error detection (Liang et al, 2018) White Word/Char Text classification (Ebrahimi et al, 2018b) White Word/Char Text classification (Yang et al, 2018) White Word/Char Text classification Table SM3: A categorization of methods for adversarial examples in NLP according to adversary's knowledge (white-box vs. black-box), attack specificity (targeted vs. non-targeted), the modified linguistic unit (words, characters, etc. ), and the attacked task.…”

Section: Supplementary Materialsmentioning

confidence: 99%

Analysis Methods in Neural Language Processing: A Survey

Belinkov

Glass

2019

Transactions of the Association for Computational Linguistics

336

283

View full text Add to dashboard Cite

The field of natural language processing has seen impressive progress in recent years, with neural network models replacing many of the traditional systems. A plethora of new models have been proposed, many of which are thought to be opaque compared to their feature-rich counterparts. This has led researchers to analyze, interpret, and evaluate neural networks in novel and more finegrained ways. In this survey paper, we review analysis methods in neural language processing, categorize them according to prominent research trends, highlight existing limitations, and point to potential directions for future work.

show abstract

“…• Black-box Attack -There is no need to get any information of the target model and training dataset in advance; our approach could straightly act on an artificial test dataset. In fact, we use a metaheuristic to search instead of heuristically searching for important tokens [9]. • Non-Gradient Method -The proposed method does not compute or need to compute cost gradients.…”

Section: Introductionmentioning

confidence: 99%

Universal Rules for Fooling Deep Neural Networks based Text Classification

Vargas

Sakurai

2019

2019 IEEE Congress on Evolutionary Computation (CEC)

View full text Add to dashboard Cite

Recently, deep learning based natural language processing techniques are being extensively used to deal with spam mail, censorship evaluation in social networks, among others. However, there is only a couple of works evaluating the vulnerabilities of such deep neural networks. Here, we go beyond attacks to investigate, for the first time, universal rules, i.e., rules that are sample agnostic and therefore could turn any text sample in an adversarial one. In fact, the universal rules do not use any information from the method itself (no information from the method, gradient information or training dataset information is used), making them black-box universal attacks. In other words, the universal rules are sample and method agnostic. By proposing a coevolutionary optimization algorithm we show that it is possible to create universal rules that can automatically craft imperceptible adversarial samples (only less than five perturbations which are close to misspelling are inserted in the text sample). A comparison with a random search algorithm further justifies the strength of the method. Thus, universal rules for fooling networks are here shown to exist. Hopefully, the results from this work will impact the development of yet more sample and model agnostic attacks as well as their defenses.

show abstract

Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers

Cited by 451 publications

References 20 publications

TextBugger: Generating Adversarial Text Against Real-world Applications

TextBugger: Generating Adversarial Text Against Real-world Applications

Analysis Methods in Neural Language Processing: A Survey

Universal Rules for Fooling Deep Neural Networks based Text Classification

Contact Info

Product

Resources

About