Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples

Cheng, Minhao; Yi, Jinfeng; Chen, Pin-Yu; Zhang, Huan; Hsieh, Cho‐Jui

doi:10.1609/aaai.v34i04.5767

Cited by 160 publications

(123 citation statements)

References 0 publications

Supporting

Mentioning

118

Contrasting

Order By: Relevance

“…Seq2Sick: Cheng et al [ 158 ] considered adversarial attacks against seq2seq models, which were widely adopted in text summarisation and neural machine translation tasks. The two main challenges in producing successful seq2seq attacks include the discrete input domain and the almost infinite output domain.…”

Section: Different Scopes Of Machine Learning Interpretability: a mentioning

confidence: 99%

Explainable AI: A Review of Machine Learning Interpretability Methods

Linardatos

Papastefanopoulos

Kotsiantis

2020

Entropy

1,429

697

View full text Add to dashboard Cite

Recent advances in artificial intelligence (AI) have led to its widespread industrial adoption, with machine learning systems demonstrating superhuman performance in a significant number of tasks. However, this surge in performance, has often been achieved through increased model complexity, turning such systems into “black box” approaches and causing uncertainty regarding the way they operate and, ultimately, the way that they come to decisions. This ambiguity has made it problematic for machine learning systems to be adopted in sensitive yet critical domains, where their value could be immense, such as healthcare. As a result, scientific interest in the field of Explainable Artificial Intelligence (XAI), a field that is concerned with the development of new methods that explain and interpret machine learning models, has been tremendously reignited over recent years. This study focuses on machine learning interpretability methods; more specifically, a literature review and taxonomy of these methods are presented, as well as links to their programming implementations, in the hope that this survey would serve as a reference point for both theorists and practitioners.

show abstract

Section: Different Scopes Of Machine Learning Interpretability: a mentioning

confidence: 99%

Explainable AI: A Review of Machine Learning Interpretability Methods

Linardatos

Papastefanopoulos

Kotsiantis

2020

Entropy

1,429

697

View full text Add to dashboard Cite

show abstract

“…Attacks on other types of models may have more sophisticated goals. For example, attacks on translation may attempt to change every word of a translation, or introduce targeted keywords into the translation (Cheng et al, 2018).…”

Section: Constraints On Adversarial Examples In Natural Languagementioning

confidence: 99%

Reevaluating Adversarial Examples in Natural Language

Morris¹,

Lifland²,

Lanchantin³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

State-of-the-art attacks on NLP models lack a shared definition of what constitutes a successful attack. These differences make the attacks difficult to compare and hindered the use of adversarial examples to understand and improve NLP models. We distill ideas from past work into a unified framework: a successful natural language adversarial example is a perturbation that fools the model and follows four proposed linguistic constraints. We categorize previous attacks based on these constraints. For each constraint, we suggest options for human and automatic evaluation methods. We use these methods to evaluate two state-of-the-art synonym substitution attacks. We find that perturbations often do not preserve semantics, and 38% introduce grammatical errors. Next, we conduct human studies to find a threshold for each evaluation method that aligns with human judgment. Human surveys reveal that to successfully preserve semantics, we need to significantly increase the minimum cosine similarities between the embeddings of swapped words and between the sentence encodings of original and perturbed sentences. With constraints adjusted to better preserve semantics and grammaticality, the attack success rate drops by over 70 percentage points. 1

show abstract

“…We chose to evaluate the robustness under two types of attacks. In the first type of "targeted keyword attack" discussed in (Cheng et al, 2018), we attempt to generate an adversarial input sequence such that a specific keyword appears in the output sequence within the threshold ∆ of number of word changes we allowed. Empirically, we set ∆ = 3 in these experiments and adopt the most successful attack, GS-EC, to this case.…”

Section: Experiments Iii: Machine Translationmentioning

confidence: 99%

On the Robustness of Self-Attentive Models

Hsieh¹,

Cheng²,

Juan³

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

This work examines the robustness of selfattentive neural networks against adversarial input perturbations. Specifically, we investigate the attention and feature extraction mechanisms of state-of-the-art recurrent neural networks and self-attentive architectures for sentiment analysis, entailment and machine translation under adversarial attacks. We also propose a novel attack algorithm for generating more natural adversarial examples that could mislead neural models but not humans. Experimental results show that, compared to recurrent neural models, self-attentive models are more robust against adversarial perturbation. In addition, we provide theoretical explanations for their superior robustness to support our claims.

show abstract

Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples

Cited by 160 publications

References 0 publications

Explainable AI: A Review of Machine Learning Interpretability Methods

Explainable AI: A Review of Machine Learning Interpretability Methods

Reevaluating Adversarial Examples in Natural Language

On the Robustness of Self-Attentive Models

Contact Info

Product

Resources

About