Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 2021
DOI: 10.18653/v1/2021.findings-acl.141
|View full text |Cite
|
Sign up to set email alerts
|

BERT-Defense: A Probabilistic Model Based on BERT to Combat Cognitively Inspired Orthographic Adversarial Attacks

Abstract: Adversarial attacks expose important blind spots of deep learning systems. While wordand sentence-level attack scenarios mostly deal with finding semantic paraphrases of the input that fool NLP models, character-level attacks typically insert typos into the input stream. It is commonly thought that these are easier to defend via spelling correction modules. In this work, we show that both a standard spellchecker and the approach of Pruthi et al. (2019), which trains to defend against insertions, deletions and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
2

Relationship

2
8

Authors

Journals

citations
Cited by 17 publications
(10 citation statements)
references
References 23 publications
0
10
0
Order By: Relevance
“…We find that all metrics capture these linguistic aspects to certain (but differing) degrees, and they are particularly sensitive to lexical overlap, which makes them prone to similar adversarial fooling (cf. Li et al, 2020;Keller et al, 2021) as BLEU-based lexical overlap metrics. Overall, our contributions are:…”
Section: Introductionmentioning
confidence: 99%
“…We find that all metrics capture these linguistic aspects to certain (but differing) degrees, and they are particularly sensitive to lexical overlap, which makes them prone to similar adversarial fooling (cf. Li et al, 2020;Keller et al, 2021) as BLEU-based lexical overlap metrics. Overall, our contributions are:…”
Section: Introductionmentioning
confidence: 99%
“…The NLP community has differentiated between sentence-, word-and character-level attacks (Zeng et al 2021). Sentenceand word-level attacks often have the goal to produce examples that are similar in terms of meaning (Alzantot et al 2018;Li et al 2020), while character-level attacks mimick various forms of typographical errors (including visual and phonetic modifications) (Ebrahimi et al 2018;Pruthi, Dhingra, and Lipton 2019;Eger et al 2019;Eger and Benz 2020;Keller, Mackensen, and Eger 2021).…”
Section: New Explainability Approaches For Mt Evaluationmentioning
confidence: 99%
“…Other defenses. Several other shielding methods exist (Keller et al, 2021;Eger et al, 2019;Zhu et al, 2021). For example, Rodriguez and Galeano (2018) defend Perspective (Google's toxicity classification model) by neutralizing adversarial inputs via a negated predicates list.…”
Section: Related Workmentioning
confidence: 99%