2018 IEEE Security and Privacy Workshops (SPW) 2018
DOI: 10.1109/spw.2018.00016
|View full text |Cite
|
Sign up to set email alerts
|

Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers

Abstract: Although various techniques have been proposed to generate adversarial samples for white-box attacks on text, little attention has been paid to black-box attacks, which are more realistic scenarios. In this paper, we present a novel algorithm, DeepWordBug, to effectively generate small text perturbations in a black-box setting that forces a deep-learning classifier to misclassify a text input. We employ novel scoring strategies to identify the critical tokens that, if modified, cause the classifier to make an … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
433
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 451 publications
(436 citation statements)
references
References 20 publications
2
433
0
1
Order By: Relevance
“…Step 2: Bugs Generation (line [6][7][8][9][10][11][12][13][14]. To generate bugs, many operations can be used.…”
Section: ) White-box Attackmentioning
confidence: 99%
See 1 more Smart Citation
“…Step 2: Bugs Generation (line [6][7][8][9][10][11][12][13][14]. To generate bugs, many operations can be used.…”
Section: ) White-box Attackmentioning
confidence: 99%
“…Step 2: Find Important Words (line [8][9][10][11]. Considering the vast search space of possible changes, we should first find the most important words that contribute the most to the original prediction results, and then modify them slightly by controlling the semantic similarity.…”
Section: ) White-box Attackmentioning
confidence: 99%
“…Often, the adversarial examples are inspired by text edits that are thought to be natural or commonly generated by humans, such as typos, misspellings, and so on (Sakaguchi et al, 2017;Heigold et al, 2018;Belinkov and Bisk, 2018). Gao et al (2018) defined scoring functions to identify tokens to modify. Their functions do not require access to model internals, but they do require the model prediction score.…”
Section: Adversary's Knowledgementioning
confidence: 99%
“…(Zhao et al, 2018a) Coref. (Heigold et al, 2018) Black Char MT, morphology (Sakaguchi et al, 2017) Black Char Spelling correction (Zhao et al, 2018c) Black , Word MT, natural language inference (Gao et al, 2018) Black Char Text classification, sentiment (Jia and Liang, 2017) Black Sentence Reading comprehension (Iyyer et al, 2018) Black Syntax Sentiment, entailment Black (Sato et al, 2018) White Word Text classification, sentiment, grammatical error detection (Liang et al, 2018) White Word/Char Text classification (Ebrahimi et al, 2018b) White Word/Char Text classification (Yang et al, 2018) White Word/Char Text classification Table SM3: A categorization of methods for adversarial examples in NLP according to adversary's knowledge (white-box vs. black-box), attack specificity (targeted vs. non-targeted), the modified linguistic unit (words, characters, etc. ), and the attacked task.…”
Section: Supplementary Materialsmentioning
confidence: 99%
“…• Black-box Attack -There is no need to get any information of the target model and training dataset in advance; our approach could straightly act on an artificial test dataset. In fact, we use a metaheuristic to search instead of heuristically searching for important tokens [9]. • Non-Gradient Method -The proposed method does not compute or need to compute cost gradients.…”
Section: Introductionmentioning
confidence: 99%