TextFirewall: Omni-Defending Against Adversarial Texts in Sentiment Classification

Wang, Wenqi; Wang, Run; Ke, Jianpeng; Wang, Lina

doi:10.1109/access.2021.3058278

Cited by 13 publications

(11 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, Wang et al [28] propose TextFirewall, which uses impact scores of individual words to detect adversarial inputs. We notice two similarities with their work; the basic concept of impact score used by TextFirewall is slightly similar to our C F -scores.…”

Section: Adversarial Defenses Against Adversarial Nlp Attacksmentioning

confidence: 99%

See 1 more Smart Citation

Con-Detect: Detecting Adversarially Perturbed Natural Language Inputs to Deep Classifiers Through Holistic Analysis

Argani¹,

Khan²,

AlGhadhban³

et al. 2022

Preprint

View full text Add to dashboard Cite

<div>Deep Learning (DL) algorithms have shown wonders in many Natural Language Processing (NLP) tasks such as language-to-language translation, spam filtering, fake-news detection, and comprehension understanding. However, research has shown that the adversarial vulnerabilities of deep learning networks manifest themselves when DL is used for NLP tasks. Most mitigation techniques proposed to date are supervised—relying on adversarial retraining to improve the robustness—which is impractical. This work introduces a novel, unsupervised detection methodology for detecting adversarial inputs to NLP classifiers. In summary, we note that minimally perturbing an input to change a model’s output—a major strength of adversarial attacks—is a weakness that leaves unique statistical marks reflected in the cumulative contribution scores of the input. Particularly, we show that the cumulative contribution score, called CF-score, of adversarial inputs is generally greater than that of the clean inputs. We thus propose Con-Detect—a <u>Con</u>tribution based <u>Detect</u>ion method—for detecting adversarial attacks against NLP classifiers. Con-Detect can be deployed with any classifier without having to retrain it. We experiment with multiple attackers—Text-bugger, Text-fooler, PWWS—on several architectures—MLP, CNN, LSTM, Hybrid CNN-RNN, BERT—trained for different classification tasks—IMDB sentiment classification, fake-news classification, AG news topic classification—under different threat models—Con-Detect-blind attacks, Con-Detect-aware attacks, and Con-Detect-adaptive attacks—and show that Con-Detect can reduce the attack success rate (ASR) of different attacks from 100% to as low as 0% for the best cases and =70% for the worst case. Even in the worst case, we note a 100% increase in the required number of queries and a 50% increase in the number of words perturbed, suggesting that Con-Detect is hard to evade.</div>

show abstract

Section: Adversarial Defenses Against Adversarial Nlp Attacksmentioning

confidence: 99%

“…• Unlike Con-Detect, Textfirewall is only effective for sentiment classification where a word may either be positive or negative [28]. 3: Con-Detect methodology; given an input sequence, X, we compute C F (X) by adding individual word contributions, C F (x i ) for all x i ∈ X.…”

Section: Adversarial Defenses Against Adversarial Nlp Attacksmentioning

confidence: 99%

Con-Detect: Detecting Adversarially Perturbed Natural Language Inputs to Deep Classifiers Through Holistic Analysis

Argani¹,

Khan²,

AlGhadhban³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Dataset Method Attack Defense [169] IMDB [171], Yelp [64] DNNs Deepwordbug [57], GA [81], PWWS [70] Considers inconsistency between the model's output and the impact value.…”

Section: Refmentioning

confidence: 99%

“…To this aim, two different BERT models, pre-trained on general-purpose and domain-specific data, are fine-tuned in a novel framework-based adversarial training. Wang et al [169] on the other hand propose an adversarial defense tool namely TextFirewall for sentiment analysis algorithms. TextFirewall mainly relies on the inconsistency between the sentiment analysis model's prediction and the impact value, which is calculated by quantifying the positive and negative impact of a word on the sentiment polarity.…”

Section: Refmentioning

confidence: 99%

Adversarial Attacks and Defenses for Social Network Text Processing Applications: Techniques, Challenges and Future Research Directions

Alsmadi¹,

Ahmad²,

Nazzal³

et al. 2021

Preprint

View full text Add to dashboard Cite

The growing use of social media has led to the development of several Machine Learning (ML) and Natural Language Processing (NLP) tools to process the unprecedented amount of social media content to make actionable decisions. However, these ML and NLP algorithms have been widely shown to be vulnerable to adversarial attacks. These vulnerabilities allow adversaries to launch a diversified set of adversarial attacks on these algorithms in different applications of social media text processing. In this paper, we provide a comprehensive review of the main approaches for adversarial attacks and defenses in the context of social media applications with a particular focus on key challenges and future research directions. In detail, we cover literature on six key applications, namely (i) rumors detection, (ii) satires detection, (iii) clickbaits & spams identification, (iv) hate speech detection, (v) misinformation detection, and (vi) sentiment analysis. We then highlight the concurrent and anticipated future research questions and provide recommendations and directions for future work.

show abstract

“…Pruthi et al [160] proposed RNN-based word recognizers to detect adversarial examples by detecting misspellings in the sentences, but it is hard to defend word-level attacks. By calculating the influence of words in texts, Wang et al [170] proposed a general text adversarial examples detection algorithm named TextFirewall. They used it to defend the adversarial attacks from Deepwordbug [133], Genetic attack [140], and PWWS (Probability Weighted Word Saliency) [164], and the average attack success rate decreased on Yelp and IMDB are 0.73 and 0.63%, respectively.…”

Section: Adversarial Example Processingmentioning

confidence: 99%

Adversarial Machine Learning on Social Network: A Survey

Guo

2021

Front. Phys.

View full text Add to dashboard Cite

In recent years, machine learning technology has made great improvements in social networks applications such as social network recommendation systems, sentiment analysis, and text generation. However, it cannot be ignored that machine learning algorithms are vulnerable to adversarial examples, that is, adding perturbations that are imperceptible to the human eye to the original data can cause machine learning algorithms to make wrong outputs with high probability. This also restricts the widespread use of machine learning algorithms in real life. In this paper, we focus on adversarial machine learning algorithms on social networks in recent years from three aspects: sentiment analysis, recommendation system, and spam detection, We review some typical applications of machine learning algorithms and adversarial example generation and defense algorithms for machine learning algorithms in the above three aspects in recent years. besides, we also analyze the current research progress and prospects for the directions of future research.

show abstract

TextFirewall: Omni-Defending Against Adversarial Texts in Sentiment Classification

Cited by 13 publications

References 27 publications

Con-Detect: Detecting Adversarially Perturbed Natural Language Inputs to Deep Classifiers Through Holistic Analysis

Con-Detect: Detecting Adversarially Perturbed Natural Language Inputs to Deep Classifiers Through Holistic Analysis

Adversarial Attacks and Defenses for Social Network Text Processing Applications: Techniques, Challenges and Future Research Directions

Adversarial Machine Learning on Social Network: A Survey

Contact Info

Product

Resources

About