On Adversarial Examples for Character-Level Neural Machine Translation

Ebrahimi, Javid; Lowd, Daniel; Dou, Dejing

doi:10.48550/arxiv.1806.09030

Cited by 71 publications

(29 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This work is related to work on adversarial attacks in the text domain, which can be roughly divided into the following categories. One category is adversarial misspelling, which tries to evade the classifier by some "human-imperceptible" misspelling on certain selected characters [14], [17], [67]. The core idea is to design a strategy to identify the important positions and afterwards some standard character-level operations like insertion, deletion, substitution and swap can be applied.…”

Section: Related Workmentioning

confidence: 99%

Repairing Adversarial Texts through Perturbation

Dong¹,

Wang²,

Sun³

et al. 2022

Preprint

View full text Add to dashboard Cite

It is known that neural networks are subject to attacks through adversarial perturbations, i.e., inputs which are maliciously crafted through perturbations to induce wrong predictions. Furthermore, such attacks are impossible to eliminate, i.e., the adversarial perturbation is still possible after applying mitigation methods such as adversarial training. Multiple approaches have been developed to detect and reject such adversarial inputs, mostly in the image domain. Rejecting suspicious inputs however may not be always feasible or ideal. First, normal inputs may be rejected due to false alarms generated by the detection algorithm. Second, denial-of-service attacks may be conducted by feeding such systems with adversarial inputs. To address the gap, in this work, we propose an approach to automatically repair adversarial texts at runtime. Given a text which is suspected to be adversarial, we novelly apply multiple adversarial perturbation methods in a positive way to identify a repair, i.e., a slightly mutated but semantically equivalent text that the neural network correctly classifies. Our approach has been experimented with multiple models trained for natural language processing tasks and the results show that our approach is effective, i.e., it successfully repairs about 80% of the adversarial texts. Furthermore, depending on the applied perturbation method, an adversarial text could be repaired in as short as one second on average.

show abstract

Section: Related Workmentioning

confidence: 99%

Repairing Adversarial Texts through Perturbation

Dong¹,

Wang²,

Sun³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Typically, a target word is replaced by an equivalent one chosen from a space of possible replacements. Such a space was identified by Ebrahimi et al [83] and referred to as the embedding space. In [83], the authors consider neural machine translation (NMT) and propose elementary modifications to achieve word-level adversarial attacks.…”

Section: Optimizing Adversarial Attacksmentioning

confidence: 99%

“…Such a space was identified by Ebrahimi et al [83] and referred to as the embedding space. In [83], the authors consider neural machine translation (NMT) and propose elementary modifications to achieve word-level adversarial attacks. In this regard, the authors extend the HotFLip algorithm proposed in [56] adding, removing, or replacing a word in the translation input.…”

Section: Optimizing Adversarial Attacksmentioning

confidence: 99%

“…Another subcategory of works on optimizing adversarial word-level attacks is concerned with improving the search for word replacements in an embedding space. In the current literature, there are three main categories of text replacement techniques: (i) gradientbased [83,8,80], (ii) sampling-based [34,75], and (iii) enumeration based [59,84,74], as shown in Fig. 4.…”

Section: Optimizing Adversarial Attacksmentioning

confidence: 99%

See 1 more Smart Citation

Adversarial Attacks and Defenses for Social Network Text Processing Applications: Techniques, Challenges and Future Research Directions

Alsmadi¹,

Ahmad²,

Nazzal³

et al. 2021

Preprint

View full text Add to dashboard Cite

The growing use of social media has led to the development of several Machine Learning (ML) and Natural Language Processing (NLP) tools to process the unprecedented amount of social media content to make actionable decisions. However, these ML and NLP algorithms have been widely shown to be vulnerable to adversarial attacks. These vulnerabilities allow adversaries to launch a diversified set of adversarial attacks on these algorithms in different applications of social media text processing. In this paper, we provide a comprehensive review of the main approaches for adversarial attacks and defenses in the context of social media applications with a particular focus on key challenges and future research directions. In detail, we cover literature on six key applications, namely (i) rumors detection, (ii) satires detection, (iii) clickbaits & spams identification, (iv) hate speech detection, (v) misinformation detection, and (vi) sentiment analysis. We then highlight the concurrent and anticipated future research questions and provide recommendations and directions for future work.

show abstract

“…Instead of designing error features, recent researchers adopt ideas from adversarial learning (Goodfellow, Shlens, and Szegedy, 2014) to generate adversarial samples to mine NLP system pitfalls (Cheng et al, 2018a;Ebrahimi, Lowd, and Dou, 2018;Zhao, Dua, and Singh, 2017). Adversarial samples are minor perturbed inputs which keep the semantic meaning of the input, yet yield degraded outputs.…”

Section: Introductionmentioning

confidence: 99%

A Reinforced Generation of Adversarial Examples for Neural Machine Translation

Zou

Huang

Xie

et al. 2019

Preprint

View full text Add to dashboard Cite

Neural machine translation systems tend to fail on less decent inputs despite its great efficacy, which may greatly harm the credibility of these systems. Fathoming how and when neural-based systems fail on such cases is critical for industrial maintenance. Instead of collecting and analyzing bad cases using limited handcrafted error features, here we investigate this issue by generating adversarial samples via a new paradigm based on reinforcement learning. Our paradigm could expose pitfalls for a given performance metric, e.g. BLEU, and could target any given neural machine translation architecture. We conduct experiments of adversarial attacks on two mainstream neural machine translation architectures, RNN-search and Transformer. The results show that our method efficiently produces stable attacks with meaningpreserving adversarial samples. We also present a qualitative and quantitative analysis for the preference pattern of the attack, showing its capability of pitfall exposure.

show abstract

On Adversarial Examples for Character-Level Neural Machine Translation

Cited by 71 publications

References 15 publications

Repairing Adversarial Texts through Perturbation

Repairing Adversarial Texts through Perturbation

Adversarial Attacks and Defenses for Social Network Text Processing Applications: Techniques, Challenges and Future Research Directions

A Reinforced Generation of Adversarial Examples for Neural Machine Translation

Contact Info

Product

Resources

About