Word-level Textual Adversarial Attacking as Combinatorial Optimization

Zang, Yuan; Qi, Fanchao; Yang, Chenghao; Liu, Zhiyuan; Zhang, Meng; Li, Qun; Sun, Maosong

doi:10.18653/v1/2020.acl-main.540

Cited by 260 publications

(293 citation statements)

References 45 publications

Supporting

Mentioning

275

Contrasting

Order By: Relevance

“…Unlike most work on textual adversarial examples, Morpheus produces its adversaries by exploiting the morphology of the text. Zang et al [ 165 ] suggested applying word substitutions using the minimum semantic units, called sememes. The assumption was that the sememes of a word are indicative of the word’s meaning and, therefore, words with the same sememes should be good substitutes for each another.…”

Section: Different Scopes Of Machine Learning Interpretability: a mentioning

confidence: 99%

Explainable AI: A Review of Machine Learning Interpretability Methods

Linardatos

Papastefanopoulos

Kotsiantis

2020

Entropy

1,205

697

View full text Add to dashboard Cite

Recent advances in artificial intelligence (AI) have led to its widespread industrial adoption, with machine learning systems demonstrating superhuman performance in a significant number of tasks. However, this surge in performance, has often been achieved through increased model complexity, turning such systems into “black box” approaches and causing uncertainty regarding the way they operate and, ultimately, the way that they come to decisions. This ambiguity has made it problematic for machine learning systems to be adopted in sensitive yet critical domains, where their value could be immense, such as healthcare. As a result, scientific interest in the field of Explainable Artificial Intelligence (XAI), a field that is concerned with the development of new methods that explain and interpret machine learning models, has been tremendously reignited over recent years. This study focuses on machine learning interpretability methods; more specifically, a literature review and taxonomy of these methods are presented, as well as links to their programming implementations, in the hope that this survey would serve as a reference point for both theorists and practitioners.

show abstract

Section: Different Scopes Of Machine Learning Interpretability: a mentioning

confidence: 99%

Explainable AI: A Review of Machine Learning Interpretability Methods

Linardatos

Papastefanopoulos

Kotsiantis

2020

Entropy

1,205

697

View full text Add to dashboard Cite

show abstract

“…Queries Cache hits Alzantot et al (2018) 1029 736 Zang et al (2020) 3745 3080 Table 1: "Queries" stands for average number of queries to victim model to attack one sample, while "cache hits" represents the average number of times a query has resulted in a hit to the model output cache. Each cache hit saves a query to the model, so more cache hits indicates a higher performance boost due to caching.…”

Section: Attackmentioning

confidence: 99%

“…In some cases, this high-level caching can cause a significant performance increase. We experimented with attacking 100 samples for BERT-base model (Devlin et al, 2018) trained on SST-2 dataset (Socher et al, 2013) using methods proposed by Alzantot et al (2018) and Zang et al (2020). Table 1 shows that in both cases, significant number of queries to the victim model result in hits to the model output cache, helping us save time by avoiding unnecessary computations.…”

Section: Attackmentioning

confidence: 99%

“…Deep neural network (DNN) models have seen dominant use in NLP tasks such text classification, natural language inference, machine translation, and question answering. However, despite their state-of-the-art performance, NLP DNNs are still vulnerable to adversarial attacks (Zhang et al, 2020). As a result, there have been growing efforts to develop tools that can help researchers and developers better understand the capability of their NLP models.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

TextAttack: Lessons learned in designing Python frameworks for NLP

Morris¹,

Yoo²,

Qi³

2020

Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)

View full text Add to dashboard Cite

TextAttack is an open-source Python toolkit for adversarial attacks, adversarial training, and data augmentation in NLP. TextAttack unites 15+ papers from the NLP adversarial attack literature into a single framework, with many components reused across attacks. This framework allows both researchers and developers to test and study the weaknesses of their NLP models. To build such an open-source NLP toolkit requires solving some common problems: How do we enable users to supply models from different deep learning frameworks? How can we build tools to support as many different datasets as possible? We share our insights into developing a well-written, well-documented NLP Python framework in hope that they can aid future development of similar packages.

show abstract

“…Researchers have proposed numerous methods to generate adversarial texts, which can be divided into char-level [12], word-level [13], sentence-level [14], and multi-level (i.e., a mixture of the previous three methods) [15,16]. They modify the characters, words, and sentences in the inputs, respectively.…”

Section: Introductionmentioning

confidence: 99%

TextFirewall: Omni-Defending Against Adversarial Texts in Sentiment Classification

Wang

et al. 2021

IEEE Access

View full text Add to dashboard Cite

Sentiment classification has been broadly applied in real life, such as product recommendation and opinionoriented analysis. Unfortunately, the widely employed sentiment classification systems based on deep neural networks (DNNs) are susceptible to adversarial attacks with imperceptible perturbations into the legitimate texts (also called adversarial texts). Adversarial texts could cause erroneous outputs even without access to the target model, bringing security concerns to systems deployed in safety-critical applications. However, studies on defending against adversarial texts are still in the early stage and not ready for tackling the emerging threats, especially in dealing with unknown attacks. Investigating the minor differences between adversarial texts and legitimate texts and enhancing the robustness of target models are two mainstream ideas for defending against adversarial texts. However, both of them suffer the generalization issue in dealing with unknown adversarial attacks. In this paper, we proposed a general method, called TextFirewall, for defending against adversarial texts crafted by various adversarial attacks, which shows the potential in identifying new developed adversarial attacks in the future. Given a piece of text, our TextFirewall identifies the adversarial text by investigating the inconsistency between the target model's output and the impact value calculated by important words in the text. TextFirewall could be deployed as a third-party tool without modifying the target model and agnostic to the specific type of adversarial texts. Experimental results demonstrate that our proposed TextFirewall effectively identifies adversarial texts generated by the three state-of-the-art (SOTA) attacks and outperforms previous defense techniques. Specifically, TextFirewall achieves an average accuracy of 90.7% on IMDB and 96.9% on Yelp in defending the three SOTA attacks.

show abstract

Word-level Textual Adversarial Attacking as Combinatorial Optimization

Cited by 260 publications

References 45 publications

Explainable AI: A Review of Machine Learning Interpretability Methods

Explainable AI: A Review of Machine Learning Interpretability Methods

TextAttack: Lessons learned in designing Python frameworks for NLP

TextFirewall: Omni-Defending Against Adversarial Texts in Sentiment Classification

Contact Info

Product

Resources

About