Multi-granularity Textual Adversarial Attack with Behavior Cloning

Chen, Yangyi; Su, Jin; Wei, Wei

doi:10.18653/v1/2021.emnlp-main.371

Cited by 12 publications

(3 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…How readily our results translate across to other language pairs, translation systems, metrics, or domains requires further investigation. We experiment with only word-and character-level attacks, but other methods exist that generate sentence-level (Ross et al, 2022) or multilevel (Chen et al, 2021) attacks. We leave a more comprehensive study of attack methods to future work.…”

Section: Limitationsmentioning

confidence: 99%

Robustness Tests for Automatic Machine Translation Metrics with Adversarial Attacks

Huang,

Baldwin

2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

We investigate MT evaluation metric performance on adversarially-synthesized texts, to shed light on metric robustness. We experiment with word-and character-level attacks on three popular machine translation metrics: BERTScore, BLEURT, and COMET. Our human experiments validate that automatic metrics tend to overpenalize adversarially-degraded translations. We also identify inconsistencies in BERTScore ratings, where it judges the original sentence and the adversarially-degraded one as similar, while judging the degraded translation as notably worse than the original with respect to the reference. We identify patterns of brittleness that motivate more robust metric development.

show abstract

Section: Limitationsmentioning

confidence: 99%

Robustness Tests for Automatic Machine Translation Metrics with Adversarial Attacks

Huang,

Baldwin

2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

show abstract

“…The adversarial vulnerability of deep learning models is a long-standing problem (Goodfellow et al, 2015). Various attack methods have demonstrated that even LLMs can be deceived with small, intentionally crafted perturbations (e.g., typos and synonym substitution) (Jin et al, 2020;Li et al, 2020;Chen et al, 2021;Liu et al, 2022a;Wang et al, 2023b). In response to adversarial attacks, many adversarial defense methods have been proposed to enhance model robustness.…”

Section: Introductionmentioning

confidence: 99%

Generative Adversarial Training with Perturbed Token Detection for Model Robustness

Zhao,

Mao

2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Adversarial training is the dominant strategy towards model robustness. Current adversarial training methods typically apply perturbations to embedding representations, whereas actual text-based attacks introduce perturbations as discrete tokens. Thus there exists a gap between the continuous embedding representations and discrete text tokens that hampers the effectiveness of adversarial training. Moreover, the continuous representations of perturbations cannot be further utilized, resulting in the suboptimal performance. To bridge this gap for adversarial robustness, in this paper, we devise a novel generative adversarial training framework that integrates gradient-based learning, adversarial example generation and perturbed token detection. Our proposed framework consists of generative adversarial attack and adversarial training process. Specifically, in generative adversarial attack, the embeddings are shared between the classifier and the generative model, which enables the generative model to leverage the gradients from the classifier for generating perturbed tokens. Then, adversarial training process combines adversarial regularization with perturbed token detection to provide token-level supervision and improve the efficiency of sample utilization. Extensive experiments on five datasets from the AdvGLUE benchmark demonstrate that our framework significantly enhances the model robustness, surpassing the state-of-the-art results of ChatGPT by 10% in average accuracy.

show abstract

“…This method greatly improves the accuracy of the model placing chess pieces and speeds up the training speed. Some other works related to behavior cloning can be found in [24,25].…”

Section: Introductionmentioning

confidence: 99%

Performance Evaluation of Multiagent Reinforcement Learning Based Training Methods for Swarm Fighting

Gao

Cai

et al. 2022

Wireless Communications and Mobile Computing

View full text Add to dashboard Cite

In this paper, we conducted a performance evaluation of two multiagent reinforcement learning based training methods for swarm fighting, namely, the multiagent reinforcement learning (MARL) training method, and the combined multiagent reinforcement learning and behavior cloning (MARL-BC) training method. The behavior cloning expert is taken from some well-trained model in the final steady phase by the MARL training method. From the perspective of winning rate, the performances of these two different training methods can be divided into three phases. In the first phase, learning progresses slowly for both these two training methods. As the model trained by the MARL training method grows stronger, the experience of the behavior cloning expert gradually becomes useful, and the second phase kicks off where the MARL-BC training method takes obvious advantage. Surprisingly, the advantage of the MARL-BC training method will disappear as the learning progress goes on because in this final phase the expert of the behavior cloning training method can no longer offer the right strategy in presence of the ever changing environment and opponent.

show abstract

Multi-granularity Textual Adversarial Attack with Behavior Cloning

Cited by 12 publications

References 41 publications

Robustness Tests for Automatic Machine Translation Metrics with Adversarial Attacks

Robustness Tests for Automatic Machine Translation Metrics with Adversarial Attacks

Generative Adversarial Training with Perturbed Token Detection for Model Robustness

Performance Evaluation of Multiagent Reinforcement Learning Based Training Methods for Swarm Fighting

Contact Info

Product

Resources

About