Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.417
|View full text |Cite
|
Sign up to set email alerts
|

CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial Text Generation

Abstract: NLP models are shown to suffer from robustness issues, i.e., a model's prediction can be easily changed under small perturbations to the input. In this work, we present a Controlled Adversarial Text Generation (CAT-Gen) model that, given an input text, generates adversarial texts through controllable attributes that are known to be irrelevant to task labels. For example, in order to attack a model for sentiment classification over product reviews, we can use the product categories as the controllable attribute… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 40 publications
(10 citation statements)
references
References 16 publications
0
9
0
Order By: Relevance
“…However, in order to achieve this goal, techniques to explain the model predictions should be put in place. These aspects are strictly linked with the need of characterizing the robustness of the prediction made by the trained models to variations of the test answers [76]. A research objective that needs to be taken into account is to study how sensitive ASAG models are to changes in the answers of students, in order to avoid that they can be tricked by using certain keywords or by swapping the order of words.…”
Section: Outlook and Future Workmentioning
confidence: 99%
“…However, in order to achieve this goal, techniques to explain the model predictions should be put in place. These aspects are strictly linked with the need of characterizing the robustness of the prediction made by the trained models to variations of the test answers [76]. A research objective that needs to be taken into account is to study how sensitive ASAG models are to changes in the answers of students, in order to avoid that they can be tricked by using certain keywords or by swapping the order of words.…”
Section: Outlook and Future Workmentioning
confidence: 99%
“…• CAT-Gen: It uses an encoder-decoder to generate adversarial sentences. During the encoding, pre-defined controllable attributes are concatenated with the representation of each benign sentence (Wang et al 2020).…”
Section: Methodsmentioning
confidence: 99%
“…In general, adversarial sentences used in current adversarial training methods are generated by existing attacks which usually generate an adversarial sentence based on a benign sentence. For example, Alzantot (Alzantot et al 2018) used a population-based optimization algorithm to generate semantically similar adversarial examples via word replacements; Jin (Jin et al 2020) proposed TextFooler to generate utility-preserving adversarial examples by synonyms replacement; unlike the attack methods based on heuristic word replacements, Wang (Wang et al 2020) proposed CAT-Gen, which applies a language model to implicitly generate adversarial sentences and uses predefined controllable attributes (e.g., gender) to aid the text generation. However, as mentioned before, adversarial training based on these attacks suffer from two principal problems: the drop in model's generalization and the ineffectiveness to other text attacks.…”
Section: Related Workmentioning
confidence: 99%
“…For example, first, rank the tokens by importance and determine the replacement order; then use different strategies to find the best replacement for each token, thereby significantly reducing the time complexity of the search and forming an adversarial example. In addition, some works use paraphrasing [23], [24], text generation [25], [26], generative adversarial networks(GAN) [27], reinforcement learning [28] and other methods [29] to generate adversarial examples.…”
Section: A Adversarial Attacksmentioning
confidence: 99%