HotFlip: White-Box Adversarial Examples for Text Classification

Ebrahimi, Javid; Rao, Anyi; Lowd, Daniel; Dou, Dejing

doi:10.48550/arxiv.1712.06751

Cited by 121 publications

(165 citation statements)

References 11 publications

Supporting

Mentioning

151

Contrasting

Order By: Relevance

“…Numerous research studies have extensively studied the role of adversarial attacks in developing robust NLP models [35], [39], [54], [58]. For example, Cheng et al [54] study crafting AEs for seq2seq models whose inputs are discrete text strings.…”

Section: Breaching Security By Improving Attacksmentioning

confidence: 99%

“…The classification accuracy has been utilized by numerous research works [34], [35], [40], [41], [45], [59], [103], [105], [106]. For example, in [59], Zhang et al used the classifi-cation accuracy metric to evaluate their proposed Metropolis-Hastings Sampling Algorithm (MHA) and demonstrated that MHA under classification accuracy outperforms the baseline model on attacking capability.…”

Section: Classification Accuracymentioning

confidence: 99%

“…Carllini et al [34] Defensive Distillation Ebrahimi et al [35] Character Substitution Madry et al [36] Projected Gradient Descent Wong et al [37] Text Example Generation Zhao et al [38]…”

mentioning

confidence: 99%

See 2 more Smart Citations

Robust Natural Language Processing: Recent Advances, Challenges, and Future Directions

Omar¹,

Choi²,

Nyang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Recent natural language processing (NLP) techniques have accomplished high performance on benchmark datasets, primarily due to the significant improvement in the performance of deep learning. The advances in the research community have led to great enhancements in state-of-the-art production systems for NLP tasks, such as virtual assistants, speech recognition, and sentiment analysis. However, such NLP systems still often fail when tested with adversarial attacks. The initial lack of robustness exposed troubling gaps in current models' language understanding capabilities, creating problems when NLP systems are deployed in real life. In this paper, we present a structured overview of NLP robustness research by summarizing the literature in a systemic way across various dimensions. We then take a deep-dive into the various dimensions of robustness, across techniques, metrics, embeddings, and benchmarks. Finally, we argue that robustness should be multi-dimensional, provide insights into current research, identify gaps in the literature to suggest directions worth pursuing to address these gaps.

show abstract

Section: Breaching Security By Improving Attacksmentioning

confidence: 99%

Section: Classification Accuracymentioning

confidence: 99%

See 1 more Smart Citation

Robust Natural Language Processing: Recent Advances, Challenges, and Future Directions

Omar¹,

Choi²,

Nyang³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…In Table 3, we list sample works on characterlevel attacks with their model accessibility, attack type, targeted model, application, or task. As a pioneering work on the character-level attack, [56] investigates white-box attack with character-level adversarial examples to maximize the model's loss at limited numbers of modifications. This is referred to as the HotFlip algorithm.…”

Section: Character-level Attacksmentioning

confidence: 99%

Adversarial Attacks and Defenses for Social Network Text Processing Applications: Techniques, Challenges and Future Research Directions

Alsmadi¹,

Ahmad²,

Nazzal³

et al. 2021

Preprint

View full text Add to dashboard Cite

The growing use of social media has led to the development of several Machine Learning (ML) and Natural Language Processing (NLP) tools to process the unprecedented amount of social media content to make actionable decisions. However, these ML and NLP algorithms have been widely shown to be vulnerable to adversarial attacks. These vulnerabilities allow adversaries to launch a diversified set of adversarial attacks on these algorithms in different applications of social media text processing. In this paper, we provide a comprehensive review of the main approaches for adversarial attacks and defenses in the context of social media applications with a particular focus on key challenges and future research directions. In detail, we cover literature on six key applications, namely (i) rumors detection, (ii) satires detection, (iii) clickbaits & spams identification, (iv) hate speech detection, (v) misinformation detection, and (vi) sentiment analysis. We then highlight the concurrent and anticipated future research questions and provide recommendations and directions for future work.

show abstract

“…Brittleness of neural network models is a serious concern, both theoretically (Biggio et al 2013;Szegedy et al 2014) and practically, including Natural Language Processing (NLP) (Belinkov and Bisk 2018;Ettinger et al 2017;Gao et al 2018;Jia and Liang 2017;Liang et al 2017;Zhang et al 2020) and more recently complex Masked Language Models (MLM) (Li et al 2020b;Sun et al 2020). In NLP, attacks are usually conducted either at character or word level (Ebrahimi et al 2017;Cheng et al 2018), or at the embedding level, exploiting (partially or fully) vulnerabilities in the symbols' representation (Alzantot et al 2018;La Malfa et al 2021). Brittleness of NLP does not pertain only to text manipulation, but also includes attacks and complementary robustness for ranking systems (Goren et al 2018).…”

Section: Related Workmentioning

confidence: 99%

The King is Naked: on the Notion of Robustness for Natural Language Processing

Malfa¹,

Kwiatkowska²

2021

Preprint

View full text Add to dashboard Cite

There is growing evidence that the classical notion of adversarial robustness originally introduced for images has been adopted as a de facto standard by a large part of the NLP research community. We show that this notion is problematic in the context of NLP as it considers a narrow spectrum of linguistic phenomena. In this paper, we argue for semantic robustness, which is better aligned with the human concept of linguistic fidelity. We characterize semantic robustness in terms of biases that it is expected to induce in a model. We study semantic robustness of a range of vanilla and robustly trained architectures using a template-based generative test bed. We complement the analysis with empirical evidence that, despite being harder to implement, semantic robustness can improve performance on complex linguistic phenomena where models robust in the classical sense fail.

show abstract

HotFlip: White-Box Adversarial Examples for Text Classification

Cited by 121 publications

References 11 publications

Robust Natural Language Processing: Recent Advances, Challenges, and Future Directions

Robust Natural Language Processing: Recent Advances, Challenges, and Future Directions

Adversarial Attacks and Defenses for Social Network Text Processing Applications: Techniques, Challenges and Future Research Directions

The King is Naked: on the Notion of Robustness for Natural Language Processing

Contact Info

Product

Resources

About