Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1423
|View full text |Cite
|
Sign up to set email alerts
|

Certified Robustness to Adversarial Word Substitutions

Abstract: State-of-the-art NLP models can often be fooled by adversaries that apply seemingly innocuous label-preserving transformations (e.g., paraphrasing) to input text. The number of possible transformations scales exponentially with text length, so data augmentation cannot cover all transformations of an input. This paper considers one exponentially large family of label-preserving transformations, in which every word in the input can be replaced with a similar word. We train the first models that are provably robu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
234
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 194 publications
(254 citation statements)
references
References 18 publications
2
234
0
Order By: Relevance
“…There were two things left unspecified in the definitions above: the distance function d to use in discrete input spaces, and the method for sampling from a local decision boundary. While there has been some work trying to formally characterize dis-tances for adversarial robustness in NLP (Michel et al, 2019;Jia et al, 2019), we find it more useful in our setting to simply rely on expert judgments to generate a similar but meaningfully different x given x, addressing both the distance function and the sampling method.…”
Section: Contrast Sets In Practicementioning
confidence: 99%
“…There were two things left unspecified in the definitions above: the distance function d to use in discrete input spaces, and the method for sampling from a local decision boundary. While there has been some work trying to formally characterize dis-tances for adversarial robustness in NLP (Michel et al, 2019;Jia et al, 2019), we find it more useful in our setting to simply rely on expert judgments to generate a similar but meaningfully different x given x, addressing both the distance function and the sampling method.…”
Section: Contrast Sets In Practicementioning
confidence: 99%
“…Note that leaderboards do not necessarily incentivize the creation of brittle and biased models; rather, because leaderboard utility is so parochial, these unintended consequences are relatively common. Some recent work has addressed the problem of brittleness by offering certificates of performance against adversarial examples (Raghunathan et al, 2018a,b;Jia et al, 2019). To tackle gender bias, the SuperGLUE leaderboard considers accuracy on the WinoBias task (Wang et al, 2019;Zhao et al, 2018).…”
Section: Robustnessmentioning
confidence: 99%
“…(2) Interval Bound Propagation (IBP) (Dvijotham et al, 2018) is proposed as a new technique to theoretically consider the worst-case perturbation. Recent works (Jia et al, 2019;Huang et al, 2019) have applied IBP in the NLP domain to certify the robustness of models. (3) Language models including GPT2 (Radford et al, 2019) may also function as an anomaly detector to probe the inconsistent and unnatural adversarial sentences.…”
Section: Discussion and Future Workmentioning
confidence: 99%