CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial Text Generation

Wang, Tianlu; Wang, Xuezhi; Qin, Yao; Packer, Ben; Li, Kang; Chen, Jilin; Beutel, Alex; Chi, Ed

doi:10.18653/v1/2020.emnlp-main.417

Cited by 40 publications

(10 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, in order to achieve this goal, techniques to explain the model predictions should be put in place. These aspects are strictly linked with the need of characterizing the robustness of the prediction made by the trained models to variations of the test answers [76]. A research objective that needs to be taken into account is to study how sensitive ASAG models are to changes in the answers of students, in order to avoid that they can be tricked by using certain keywords or by swapping the order of words.…”

Section: Outlook and Future Workmentioning

confidence: 99%

Survey on Automated Short Answer Grading with Deep Learning: from Word Embeddings to Transformers

Haller¹,

Aldea²,

Seifert³

et al. 2022

Preprint

View full text Add to dashboard Cite

Automated short answer grading (ASAG) has gained attention in education as a means to scale educational tasks to the growing number of students. Recent progress in Natural Language Processing and Machine Learning has largely influenced the field of ASAG, of which we survey the recent research advancements. We complement previous surveys by providing a comprehensive analysis of recently published methods that deploy deep learning approaches. In particular, we focus our analysis on the transition from handengineered features to representation learning approaches, which learn representative features for the task at hand automatically from large corpora of data. We structure our analysis of deep learning methods along three categories: word embeddings, sequential models, and attention-based methods. Deep learning impacted ASAG differently than other fields of NLP, as we noticed that the learned representations alone do not contribute to achieve the best results, but they rather show to work in a complementary way with hand-engineered features. The best performance are indeed achieved by methods that combine the carefully hand-engineered features with the power of the semantic descriptions provided by the latest models, like transformers architectures. We identify challenges and provide an outlook on research direction that can be addressed in the future.

show abstract

Section: Outlook and Future Workmentioning

confidence: 99%

Survey on Automated Short Answer Grading with Deep Learning: from Word Embeddings to Transformers

Haller¹,

Aldea²,

Seifert³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…• CAT-Gen: It uses an encoder-decoder to generate adversarial sentences. During the encoding, pre-defined controllable attributes are concatenated with the representation of each benign sentence (Wang et al 2020).…”

Section: Methodsmentioning

confidence: 99%

“…In general, adversarial sentences used in current adversarial training methods are generated by existing attacks which usually generate an adversarial sentence based on a benign sentence. For example, Alzantot (Alzantot et al 2018) used a population-based optimization algorithm to generate semantically similar adversarial examples via word replacements; Jin (Jin et al 2020) proposed TextFooler to generate utility-preserving adversarial examples by synonyms replacement; unlike the attack methods based on heuristic word replacements, Wang (Wang et al 2020) proposed CAT-Gen, which applies a language model to implicitly generate adversarial sentences and uses predefined controllable attributes (e.g., gender) to aid the text generation. However, as mentioned before, adversarial training based on these attacks suffer from two principal problems: the drop in model's generalization and the ineffectiveness to other text attacks.…”

Section: Related Workmentioning

confidence: 99%

KATG: Keyword-Bias-Aware Adversarial Text Generation for Text Classification

Shen

Li²,

Chen

2022

AAAI

View full text Add to dashboard Cite

Recent work has shown that current text classification models are vulnerable to small adversarial perturbation to inputs, and adversarial training that re-trains the models with the support of adversarial examples is the most popular way to alleviate the impact of the perturbation. However, current adversarial training methods have two principal problems: worse model generalization and ineffective defending against other text attacks. In this paper, we propose a Keyword-bias-aware Adversarial Text Generation model (KATG) that implicitly generates adversarial sentences using a generator-discriminator structure. Instead of using a benign sentence to generate an adversarial sentence, the KATG model utilizes extra multiple benign sentences (namely prior sentences) to guide adversarial sentence generation. Furthermore, to cover more perturbation used in existing attacks, a keyword-bias-aware sampling is proposed to select sentences containing biased words as prior sentences. Besides, to effectively utilize prior sentences, a generative flow mechanism is proposed to construct latent semantic space and learn a latent representation for the prior sentences. Experiments demonstrate that adversarial sentences generated by our KATG model can strengthen the victim model's robustness and generalization.

show abstract

“…For example, first, rank the tokens by importance and determine the replacement order; then use different strategies to find the best replacement for each token, thereby significantly reducing the time complexity of the search and forming an adversarial example. In addition, some works use paraphrasing [23], [24], text generation [25], [26], generative adversarial networks(GAN) [27], reinforcement learning [28] and other methods [29] to generate adversarial examples.…”

Section: A Adversarial Attacksmentioning

confidence: 99%

TREATED:Towards Universal Defense against Textual Adversarial Attacks

Zhu,

Gu,

Wang

et al. 2021

Preprint

View full text Add to dashboard Cite

Recent work shows that deep neural networks are vulnerable to adversarial examples. Much work studies adversarial example generation, while very little work focuses on more critical adversarial defense. Existing adversarial detection methods usually make assumptions about the adversarial example and attack method (e.g., the word frequency of the adversarial example, the perturbation level of the attack method). However, this limits the applicability of the detection method. To this end, we propose TREATED, a universal adversarial detection method that can defend against attacks of various perturbation levels without making any assumptions. TREATED identifies adversarial examples through a set of well-designed reference models. Extensive experiments on three competitive neural networks and two widely used datasets show that our method achieves better detection performance than baselines. We finally conduct ablation studies to verify the effectiveness of our method.

show abstract

CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial Text Generation

Cited by 40 publications

References 16 publications

Survey on Automated Short Answer Grading with Deep Learning: from Word Embeddings to Transformers

Survey on Automated Short Answer Grading with Deep Learning: from Word Embeddings to Transformers

KATG: Keyword-Bias-Aware Adversarial Text Generation for Text Classification

TREATED:Towards Universal Defense against Textual Adversarial Attacks

Contact Info

Product

Resources

About