2020
DOI: 10.1609/aaai.v34i05.6311
|View full text |Cite
|
Sign up to set email alerts
|

Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment

Abstract: Machine learning algorithms are often vulnerable to adversarial examples that have imperceptible alterations from the original counterparts but can fool the state-of-the-art models. It is helpful to evaluate or even improve the robustness of these models by exposing the maliciously crafted adversarial examples. In this paper, we present TextFooler, a simple but strong baseline to generate adversarial text. By applying it to two fundamental natural language tasks, text classification and textual entailment, we … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

5
865
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 647 publications
(871 citation statements)
references
References 20 publications
5
865
1
Order By: Relevance
“…The authors demonstrated that input texts can have their words removed to a degree where they make no sense to humans, without any impact on the model’s output. Ren et al [ 160 ] proposed a greedy algorithm for textual adversarial example generation , alled probability weighted word saliency (PWWS), which follows the synonyms substitution strategy, but replaces words that are based on the word saliency and classification probability TextFooler [ 161 ] generates adversarial examples for text by utilising word embedding distance and part-of-speech matching to first identify the most important words in terms of the model’s output and subsequently greedily replaces them with synonyms that fit both semantically and grammatically until a mis-classification occurs. The BERT language model was utilised in two studies in order to create textual adversarial examples: Garg and Ramakrishnan [ 162 ] and Li et al [ 163 ], both of which proposed generating adversarial examples through text perturbations that are based on the BERT masked language model, as part of the original text is masked and alternative text pieces are generated to replace these masks.…”
Section: Different Scopes Of Machine Learning Interpretability: a mentioning
confidence: 99%
“…The authors demonstrated that input texts can have their words removed to a degree where they make no sense to humans, without any impact on the model’s output. Ren et al [ 160 ] proposed a greedy algorithm for textual adversarial example generation , alled probability weighted word saliency (PWWS), which follows the synonyms substitution strategy, but replaces words that are based on the word saliency and classification probability TextFooler [ 161 ] generates adversarial examples for text by utilising word embedding distance and part-of-speech matching to first identify the most important words in terms of the model’s output and subsequently greedily replaces them with synonyms that fit both semantically and grammatically until a mis-classification occurs. The BERT language model was utilised in two studies in order to create textual adversarial examples: Garg and Ramakrishnan [ 162 ] and Li et al [ 163 ], both of which proposed generating adversarial examples through text perturbations that are based on the BERT masked language model, as part of the original text is masked and alternative text pieces are generated to replace these masks.…”
Section: Different Scopes Of Machine Learning Interpretability: a mentioning
confidence: 99%
“…The attacker then chooses the optimal perturbation for each word in S x based on the maximum reduction in the output score of class, y. Text-Fooler. For a given input sequence, X, such that F (X) = y, Text-Fooler [22] first identifies key words (S x ) by computing the difference between the classifier's prediction score before and after deleting a word from the input. For each word in S x , the attacker generates "N" perturbations by replacing the word with "N" different words closest to the actual word in a pre-defined Embedding space.…”
Section: B Adversarial Attacksmentioning
confidence: 99%
“…Music: We use the "CDs and Vinyl" subset of the publicly available Amazon reviews 13 [21] dataset which contains 2.3m interactions. We extract ratings, reviews and genres for music albums.…”
Section: Data Sourcesmentioning
confidence: 99%