Universal Adversarial Triggers for Attacking and Analyzing NLP

Wallace, Eric; Feng, Shi; Kandpal, Nikhil; Gardner, Matt; Singh, Sameer

doi:10.18653/v1/d19-1221

Cited by 474 publications

(431 citation statements)

References 27 publications

Supporting

Mentioning

390

Contrasting

Order By: Relevance

“…Finally, we suggest that care must be taken in the training of models for RE, as it appears likely that classifiers are susceptible to overfitting on non-syntactic features. This may be alleviated by the creation of training data that depend heavily on syntactic features, and advancing other methodologies such as data augmentation and Universal Adversarial Triggers 23 .…”

Section: Discussionmentioning

confidence: 99%

Ablations over transformer models for biomedical relationship extraction

et al. 2020

View full text Add to dashboard Cite

Background: Masked language modelling approaches have enjoyed success in improving benchmark performance across many general and biomedical domain natural language processing tasks, including biomedical relationship extraction (RE). However, the recent surge in both the number of novel architectures and the volume of training data they utilise may lead us to question whether domain specific pretrained models are necessary. Additionally, recent work has proposed novel classification heads for RE tasks, further improving performance. Here, we perform ablations over several pretrained models and classification heads to try to untangle the perceived benefits of each. Methods: We use a range of string preprocessing strategies, combined with Bidirectional Encoder Representations from Transformers (BERT), BioBERT and RoBERTa architectures to perform ablations over three RE datasets pertaining to drug-drug and chemical protein interactions, and general domain relationship extraction. We explore the use of the RBERT classification head, compared to a simple linear classification layer across all architectures and datasets. Results: We observe a moderate performance benefit in using the BioBERT pretrained model over the BERT base cased model, although there appears to be little difference when comparing BioBERT to RoBERTa large. In addition, we observe a substantial benefit of using the RBERT head on the general domain RE dataset, but this is not consistently reflected in the biomedical RE datasets. Finally, we discover that randomising the token order of training data does not result in catastrophic performance degradation in our selected tasks. Conclusions: We find a recent general domain pretrained model performs approximately the same as a biomedical specific one, suggesting that domain specific models may be of limited use given the tendency of recent model pretraining regimes to incorporate ever broader sets of data. In addition, we suggest that care must be taken in RE model training, to prevent fitting to non-syntactic features of datasets.

show abstract

Section: Discussionmentioning

confidence: 99%

Ablations over transformer models for biomedical relationship extraction

et al. 2020

View full text Add to dashboard Cite

show abstract

“…Attackers can also use the optimisation-based attack algorithms which are to find the optimised perturbation by maximising or minimising their objective instead of just finding any perturbation that works [3,4,19]. More intriguingly, there are L 1 norm bounded attack algorithms to limit the number of perturbed pixels [3,25], universal adversarial perturbations that work for all examples in the test dataset [21,33], etc.…”

Section: Related Work 51 Adversarial Machine Learningmentioning

confidence: 99%

Where Does the Robustness Come from?

Liao¹,

Cheng²,

Fang³

et al. 2020

Proceedings of the 13th ACM Workshop on Artificial Intelligence and Security

View full text Add to dashboard Cite

This paper aims to provide a thorough study on the effectiveness of the transformation-based ensemble defence for image classification and its reasons. It has been empirically shown that they can enhance the robustness against evasion attacks, while there is little analysis on the reasons. In particular, it is not clear whether the robustness improvement is a result of transformation or ensemble. In this paper, we design two adaptive attacks to better evaluate the transformation-based ensemble defence. We conduct experiments to show that 1) the transferability of adversarial examples exists among the models trained on data records after different reversible transformations; 2) the robustness gained through transformationbased ensemble is limited; 3) this limited robustness is mainly from the irreversible transformations rather than the ensemble of a number of models; and 4) blindly increasing the number of sub-models in a transformation-based ensemble does not bring extra robustness gain.

show abstract

“…Adversarial triggers on natural language generation: In 2019, Wallace et al [423] introduced a type of adversarial example denoted universal adversarial triggers (abbreviated with UATs in the following) which were identified via a gradient-based search. UATs are defined as "input-agnostic sequences of tokens that trigger a model to produce a specific prediction when concatenated to any input from a dataset".…”

Section: Risk Ibmentioning

confidence: 99%

“…UATs are defined as "input-agnostic sequences of tokens that trigger a model to produce a specific prediction when concatenated to any input from a dataset". These UATs were able to fool a question-answering model to answer with "to kill american people" to most "why" questions formulated in a dataset [423]. Moreover, they analyzed UATs placed within user inputs to the GPT-2 language model of OpenAI [347] known for high-quality outputs [239].…”

Section: Risk Ibmentioning

confidence: 99%

“…When the user prompts contained the neutral nonsensical token sequence "TH PEOPLEMan goddreams Blacks", GPT-2 generated a variety of potentially offending completions including e.g. "and Jews are the most evil people on earth [...]" or "[...] I'm a racist [..]" (see examples in the paper [423]). In fact, when utilizing GPT-2 (via the publicly available API for GPT-2 called "Write with Transformer" [436]), we ourselves realized two particular details.…”

Section: Risk Ibmentioning

confidence: 99%

See 1 more Smart Citation

Hybrid Cognitive-Affective Strategies for AI Safety

Aliman¹

View full text Add to dashboard Cite

As the first author of the underlying paper, I had a vital contribution and it was solely my responsibility to write down the content and to perform an extensive literature research as well as in-depth analysis. This chapter is based on a slightly modified form of the publication: N.-M. Aliman and L. Kester. Augmented Utilitarianism for AGI Safety. In International Conference on Artificial General Intelligence, pages 11-21. Springer, 2019. As the first author of the underlying paper, I had a vital contribution and it was solely my responsibility to write down the content and to perform an extensive literature research as well as in-depth analysis.

show abstract

Universal Adversarial Triggers for Attacking and Analyzing NLP

Cited by 474 publications

References 27 publications

Ablations over transformer models for biomedical relationship extraction

Ablations over transformer models for biomedical relationship extraction

Where Does the Robustness Come from?

Hybrid Cognitive-Affective Strategies for AI Safety

Contact Info

Product

Resources

About