Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021
DOI: 10.18653/v1/2021.emnlp-main.651
|View full text |Cite
|
Sign up to set email alerts
|

Contrasting Human- and Machine-Generated Word-Level Adversarial Examples for Text Classification

Abstract: Research shows that natural language processing models are generally considered to be vulnerable to adversarial attacks; but recent work has drawn attention to the issue of validating these adversarial inputs against certain criteria (e.g., the preservation of semantics and grammaticality). Enforcing constraints to uphold such criteria may render attacks unsuccessful, raising the question of whether valid attacks are actually feasible. In this work, we investigate this through the lens of human language abilit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(17 citation statements)
references
References 38 publications
0
17
0
Order By: Relevance
“…Alzantot et al [302] expose that sentiment analysis models can be fooled by synonym substitution attacks, as illustrated by their adversarial examples in Table 8.1. This has motivated a myriad of works making NLP models more robust against such attacks [303,304,305,306].…”
Section: Certified Robustness Against Natural Language Attacksmentioning
confidence: 99%
“…Alzantot et al [302] expose that sentiment analysis models can be fooled by synonym substitution attacks, as illustrated by their adversarial examples in Table 8.1. This has motivated a myriad of works making NLP models more robust against such attacks [303,304,305,306].…”
Section: Certified Robustness Against Natural Language Attacksmentioning
confidence: 99%
“…Not only that, but adversarial attacks can reveal important vulnerabilities in our systems (Zhang et al, 2020a). Although previous work has studied adversarial examples in NLP (Li et al, 2017;Zang et al, 2020;Morris et al, 2020;Mozes et al, 2021) most of them focused on accuracy as a metric of interest. Among the ones that studied toxicity and other ethical considerations (Wallace et al, 2019;Sheng et al, 2020) they did not put the focus on either conversational agents or they did not consider attacks being imperceptible.…”
Section: Related Workmentioning
confidence: 99%
“…The Adversarial NLI project asks humans to annotate mislabeled data and uses humans as adversaries to create a benchmark natural language inference (NLI) dataset for a more robust NLP model (Nie et al, 2020). The most related work compares the performance of human-and machinegenerated word-level adversarial examples for NLP classification tasks (Mozes et al, 2021).…”
Section: Related Workmentioning
confidence: 99%
“…A saliency map shows what words the target model identifies as most important that are most likely to affect the prediction, and then marks those words with colors with different intensities. Unlike (Mozes et al, 2021), where the interface displays word saliencies calculated by replacing the word with an out-of-vocabulary token, we implement the built-in method in each automated attack to calculate the saliency score. For example, BAE and TextFooler simply delete the word and calculate the word saliencies, while PWWS replaces each word with an unknown token and calculates the weighted saliency.…”
Section: Generating Adversarial Examplesmentioning
confidence: 99%
See 1 more Smart Citation