2021
DOI: 10.1109/access.2021.3058278
|View full text |Cite
|
Sign up to set email alerts
|

TextFirewall: Omni-Defending Against Adversarial Texts in Sentiment Classification

Abstract: Sentiment classification has been broadly applied in real life, such as product recommendation and opinionoriented analysis. Unfortunately, the widely employed sentiment classification systems based on deep neural networks (DNNs) are susceptible to adversarial attacks with imperceptible perturbations into the legitimate texts (also called adversarial texts). Adversarial texts could cause erroneous outputs even without access to the target model, bringing security concerns to systems deployed in safety-critical… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(11 citation statements)
references
References 27 publications
0
11
0
Order By: Relevance
“…Recently, Wang et al [28] propose TextFirewall, which uses impact scores of individual words to detect adversarial inputs. We notice two similarities with their work; the basic concept of impact score used by TextFirewall is slightly similar to our C F -scores.…”
Section: Adversarial Defenses Against Adversarial Nlp Attacksmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, Wang et al [28] propose TextFirewall, which uses impact scores of individual words to detect adversarial inputs. We notice two similarities with their work; the basic concept of impact score used by TextFirewall is slightly similar to our C F -scores.…”
Section: Adversarial Defenses Against Adversarial Nlp Attacksmentioning
confidence: 99%
“…• Unlike Con-Detect, Textfirewall is only effective for sentiment classification where a word may either be positive or negative [28]. 3: Con-Detect methodology; given an input sequence, X, we compute C F (X) by adding individual word contributions, C F (x i ) for all x i ∈ X.…”
Section: Adversarial Defenses Against Adversarial Nlp Attacksmentioning
confidence: 99%
“…Dataset Method Attack Defense [169] IMDB [171], Yelp [64] DNNs Deepwordbug [57], GA [81], PWWS [70] Considers inconsistency between the model's output and the impact value.…”
Section: Refmentioning
confidence: 99%
“…To this aim, two different BERT models, pre-trained on general-purpose and domain-specific data, are fine-tuned in a novel framework-based adversarial training. Wang et al [169] on the other hand propose an adversarial defense tool namely TextFirewall for sentiment analysis algorithms. TextFirewall mainly relies on the inconsistency between the sentiment analysis model's prediction and the impact value, which is calculated by quantifying the positive and negative impact of a word on the sentiment polarity.…”
Section: Refmentioning
confidence: 99%
“…Pruthi et al [160] proposed RNN-based word recognizers to detect adversarial examples by detecting misspellings in the sentences, but it is hard to defend word-level attacks. By calculating the influence of words in texts, Wang et al [170] proposed a general text adversarial examples detection algorithm named TextFirewall. They used it to defend the adversarial attacks from Deepwordbug [133], Genetic attack [140], and PWWS (Probability Weighted Word Saliency) [164], and the average attack success rate decreased on Yelp and IMDB are 0.73 and 0.63%, respectively.…”
Section: Adversarial Example Processingmentioning
confidence: 99%