2018
DOI: 10.48550/arxiv.1803.01128
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples

Abstract: Crafting adversarial examples has become an important technique to evaluate the robustness of deep neural networks (DNNs). However, most existing works focus on attacking the image classification problem since its input space is continuous and output space is finite. In this paper, we study the much more challenging problem of crafting adversarial examples for sequence-to-sequence (seq2seq) models, whose inputs are discrete text strings and outputs have an almost infinite number of possibilities. To address th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
44
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
7
2

Relationship

2
7

Authors

Journals

citations
Cited by 40 publications
(44 citation statements)
references
References 16 publications
0
44
0
Order By: Relevance
“…However, recent studies show that deep neural networks are vulnerable to adversarial examples (Szegedy et al, 2013;Goodfellow et al, 2015) -a tiny perturbation on an image that is almost invisible to human eyes could mislead a well-trained image classifier towards misclassification. Soon later this is proved to be not a coincidence in image classification: similar phenomena have been observed in other problems such as speech recognition (Carlini et al, 2016), visual QA (Xu et al, 2017), image captioning (Chen et al, 2017a), machine translation (Cheng et al, 2018), reinforcement learning (Pattanaik et al, 2018), and even on systems that operate in the physical world (Kurakin et al, 2016).…”
Section: Introductionmentioning
confidence: 66%
“…However, recent studies show that deep neural networks are vulnerable to adversarial examples (Szegedy et al, 2013;Goodfellow et al, 2015) -a tiny perturbation on an image that is almost invisible to human eyes could mislead a well-trained image classifier towards misclassification. Soon later this is proved to be not a coincidence in image classification: similar phenomena have been observed in other problems such as speech recognition (Carlini et al, 2016), visual QA (Xu et al, 2017), image captioning (Chen et al, 2017a), machine translation (Cheng et al, 2018), reinforcement learning (Pattanaik et al, 2018), and even on systems that operate in the physical world (Kurakin et al, 2016).…”
Section: Introductionmentioning
confidence: 66%
“…Compared with testing methods in image domain, related works in texts are very rare. Except for quantity, testing in [91] and [165] is different from others in image domain. Works in texts are to judge the robustness of models by observing its performance of the test cases.…”
Section: Testing and Verification In Textsmentioning
confidence: 98%
“…Besides, experimental results show that some other attributions (e.g., answer by elimination via ranking plausibility [164]) should be added to improve the performance. Cheng et al [165] proposed a projected gradient method to test the robustness of sequence-to-sequence (seq2seq) models. They found that seq2seq models were more robust to adversarial attacks than CNN-based classifiers.…”
Section: Testing and Verification In Textsmentioning
confidence: 99%
“…Brittleness of neural network models is a serious concern, both theoretically (Biggio et al 2013;Szegedy et al 2014) and practically, including Natural Language Processing (NLP) (Belinkov and Bisk 2018;Ettinger et al 2017;Gao et al 2018;Jia and Liang 2017;Liang et al 2017;Zhang et al 2020) and more recently complex Masked Language Models (MLM) (Li et al 2020b;Sun et al 2020). In NLP, attacks are usually conducted either at character or word level (Ebrahimi et al 2017;Cheng et al 2018), or at the embedding level, exploiting (partially or fully) vulnerabilities in the symbols' representation (Alzantot et al 2018;La Malfa et al 2021). Brittleness of NLP does not pertain only to text manipulation, but also includes attacks and complementary robustness for ranking systems (Goren et al 2018).…”
Section: Related Workmentioning
confidence: 99%