Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples

Cheng, Minhao; Yi, Jinfeng; Chen, Pin-Yu; Zhang, Huan; Hsieh, Cho‐Jui

doi:10.48550/arxiv.1803.01128

Cited by 40 publications

(44 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, recent studies show that deep neural networks are vulnerable to adversarial examples (Szegedy et al, 2013;Goodfellow et al, 2015) -a tiny perturbation on an image that is almost invisible to human eyes could mislead a well-trained image classifier towards misclassification. Soon later this is proved to be not a coincidence in image classification: similar phenomena have been observed in other problems such as speech recognition (Carlini et al, 2016), visual QA (Xu et al, 2017), image captioning (Chen et al, 2017a), machine translation (Cheng et al, 2018), reinforcement learning (Pattanaik et al, 2018), and even on systems that operate in the physical world (Kurakin et al, 2016).…”

Section: Introductionmentioning

confidence: 66%

A Frank-Wolfe Framework for Efficient and Effective Adversarial Attacks

Chen

Zhou

et al. 2018

Preprint

Self Cite

View full text Add to dashboard Cite

Depending on how much information an adversary can access to, adversarial attacks can be classified as white-box attack and black-box attack. For white-box attack, optimization-based attack algorithms such as projected gradient descent (PGD) can achieve relatively high attack success rates within moderate iterates. However, they tend to generate adversarial examples near or upon the boundary of the perturbation set, resulting in large distortion. Furthermore, their corresponding black-box attack algorithms also suffer from high query complexities, thereby limiting their practical usefulness. In this paper, we focus on the problem of developing efficient and effective optimization-based adversarial attack algorithms. In particular, we propose a novel adversarial attack framework for both white-box and black-box settings based on a variant of Frank-Wolfe algorithm. We show in theory that the proposed attack algorithms are efficient with an O(1/ √ T ) convergence rate. The empirical results of attacking the ImageNet and MNIST datasets also verify the efficiency and effectiveness of the proposed algorithms. More specifically, our proposed algorithms attain the best attack performances in both white-box and black-box attacks among all baselines, and are more time and query efficient than the state-of-the-art.

show abstract

Section: Introductionmentioning

confidence: 66%

A Frank-Wolfe Framework for Efficient and Effective Adversarial Attacks

Chen

Zhou

et al. 2018

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Compared with testing methods in image domain, related works in texts are very rare. Except for quantity, testing in [91] and [165] is different from others in image domain. Works in texts are to judge the robustness of models by observing its performance of the test cases.…”

Section: Testing and Verification In Textsmentioning

confidence: 98%

“…Besides, experimental results show that some other attributions (e.g., answer by elimination via ranking plausibility [164]) should be added to improve the performance. Cheng et al [165] proposed a projected gradient method to test the robustness of sequence-to-sequence (seq2seq) models. They found that seq2seq models were more robust to adversarial attacks than CNN-based classifiers.…”

Section: Testing and Verification In Textsmentioning

confidence: 99%

Towards a Robust Deep Neural Network in Texts: A Survey

Wang¹,

Wang²,

Wang³

et al. 2019

Preprint

View full text Add to dashboard Cite

Deep neural networks (DNNs) have achieved remarkable success in various tasks (e.g., image classification, speech recognition, and natural language processing). However, researches have shown that DNN models are vulnerable to adversarial examples, which cause incorrect predictions by adding imperceptible perturbations into normal inputs. Studies on adversarial examples in image domain have been well investigated, but in texts the research is not enough, let alone a comprehensive survey in this field. In this paper, we aim at presenting a comprehensive understanding of adversarial attacks and corresponding mitigation strategies in texts. Specifically, we first give a taxonomy of adversarial attacks and defenses in texts from the perspective of different natural language processing (NLP) tasks, and then introduce how to build a robust DNN model via testing and verification. Finally, we discuss the existing challenges of adversarial attacks and defenses in texts and present the future research directions in this emerging field.

show abstract

“…Brittleness of neural network models is a serious concern, both theoretically (Biggio et al 2013;Szegedy et al 2014) and practically, including Natural Language Processing (NLP) (Belinkov and Bisk 2018;Ettinger et al 2017;Gao et al 2018;Jia and Liang 2017;Liang et al 2017;Zhang et al 2020) and more recently complex Masked Language Models (MLM) (Li et al 2020b;Sun et al 2020). In NLP, attacks are usually conducted either at character or word level (Ebrahimi et al 2017;Cheng et al 2018), or at the embedding level, exploiting (partially or fully) vulnerabilities in the symbols' representation (Alzantot et al 2018;La Malfa et al 2021). Brittleness of NLP does not pertain only to text manipulation, but also includes attacks and complementary robustness for ranking systems (Goren et al 2018).…”

Section: Related Workmentioning

confidence: 99%

The King is Naked: on the Notion of Robustness for Natural Language Processing

Malfa¹,

Kwiatkowska²

2021

Preprint

View full text Add to dashboard Cite

There is growing evidence that the classical notion of adversarial robustness originally introduced for images has been adopted as a de facto standard by a large part of the NLP research community. We show that this notion is problematic in the context of NLP as it considers a narrow spectrum of linguistic phenomena. In this paper, we argue for semantic robustness, which is better aligned with the human concept of linguistic fidelity. We characterize semantic robustness in terms of biases that it is expected to induce in a model. We study semantic robustness of a range of vanilla and robustly trained architectures using a template-based generative test bed. We complement the analysis with empirical evidence that, despite being harder to implement, semantic robustness can improve performance on complex linguistic phenomena where models robust in the classical sense fail.

show abstract

Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples

Cited by 40 publications

References 16 publications

A Frank-Wolfe Framework for Efficient and Effective Adversarial Attacks

A Frank-Wolfe Framework for Efficient and Effective Adversarial Attacks

Towards a Robust Deep Neural Network in Texts: A Survey

The King is Naked: on the Notion of Robustness for Natural Language Processing

Contact Info

Product

Resources

About