Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2018
DOI: 10.18653/v1/p18-2006
|View full text |Cite
|
Sign up to set email alerts
|

HotFlip: White-Box Adversarial Examples for Text Classification

Abstract: We propose an efficient method to generate white-box adversarial examples to trick a character-level neural classifier. We find that only a few manipulations are needed to greatly decrease the accuracy. Our method relies on an atomic flip operation, which swaps one token for another, based on the gradients of the onehot input vectors. Due to efficiency of our method, we can perform adversarial training which makes the model more robust to attacks at test time. With the use of a few semantics-preserving constra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

2
633
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 689 publications
(697 citation statements)
references
References 12 publications
2
633
0
Order By: Relevance
“…The work HotFlip (Ebrahimi et al, 2017) considers to replace a letter in a sentence in order to mislead a characterlevel text classifier (each letter is encoded to a vector). For example, as shown in Figure 11, changing a single letter in a sentence alters the model's prediction on its topic.…”
Section: Attacking Words and Lettersmentioning
confidence: 99%
See 3 more Smart Citations
“…The work HotFlip (Ebrahimi et al, 2017) considers to replace a letter in a sentence in order to mislead a characterlevel text classifier (each letter is encoded to a vector). For example, as shown in Figure 11, changing a single letter in a sentence alters the model's prediction on its topic.…”
Section: Attacking Words and Lettersmentioning
confidence: 99%
“…They try adding, removing or modifying the words and phrases in the sentences. In their approach, the first step is similar to HotFlip (Ebrahimi et al, 2017). For each training sample, they find the most-influential letters, called "hot characters".…”
Section: Attacking Words and Lettersmentioning
confidence: 99%
See 2 more Smart Citations
“…For adversarial attacks, white-box attacks have full access to the target model while black-box attacks can only explore the models by observing the outputs with limited trials. Ebrahimi et al (2017) propose a gradient-based white-box model to attack character-level classifiers via an atomic flip operation. Small character-level transformations, such as swap, deletion, and insertion, are applied on critical tokens identified with a scoring strategy (Gao et al, 2018) or gradient-based computation (Liang et al, 2017).…”
Section: Related Workmentioning
confidence: 99%