Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence 2018
DOI: 10.24963/ijcai.2018/601
|View full text |Cite
|
Sign up to set email alerts
|

Interpretable Adversarial Perturbation in Input Embedding Space for Text

Abstract: Following great success in the image processing field, the idea of adversarial training has been applied to tasks in the natural language processing (NLP) field. One promising approach directly applies adversarial training developed in the image processing field to the input word embedding space instead of the discrete input space of texts. However, this approach abandons such interpretability as generating adversarial texts to significantly improve the performance of NLP tasks. This paper restores interpretab… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
159
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 127 publications
(161 citation statements)
references
References 3 publications
(6 reference statements)
2
159
0
Order By: Relevance
“…(Zhao et al, 2018a) Coref. (Heigold et al, 2018) Black Char MT, morphology (Sakaguchi et al, 2017) Black Char Spelling correction (Zhao et al, 2018c) Black , Word MT, natural language inference (Gao et al, 2018) Black Char Text classification, sentiment (Jia and Liang, 2017) Black Sentence Reading comprehension (Iyyer et al, 2018) Black Syntax Sentiment, entailment Black (Sato et al, 2018) White Word Text classification, sentiment, grammatical error detection (Liang et al, 2018) White Word/Char Text classification (Ebrahimi et al, 2018b) White Word/Char Text classification (Yang et al, 2018) White Word/Char Text classification Table SM3: A categorization of methods for adversarial examples in NLP according to adversary's knowledge (white-box vs. black-box), attack specificity (targeted vs. non-targeted), the modified linguistic unit (words, characters, etc. ), and the attacked task.…”
Section: Supplementary Materialsmentioning
confidence: 99%
“…(Zhao et al, 2018a) Coref. (Heigold et al, 2018) Black Char MT, morphology (Sakaguchi et al, 2017) Black Char Spelling correction (Zhao et al, 2018c) Black , Word MT, natural language inference (Gao et al, 2018) Black Char Text classification, sentiment (Jia and Liang, 2017) Black Sentence Reading comprehension (Iyyer et al, 2018) Black Syntax Sentiment, entailment Black (Sato et al, 2018) White Word Text classification, sentiment, grammatical error detection (Liang et al, 2018) White Word/Char Text classification (Ebrahimi et al, 2018b) White Word/Char Text classification (Yang et al, 2018) White Word/Char Text classification Table SM3: A categorization of methods for adversarial examples in NLP according to adversary's knowledge (white-box vs. black-box), attack specificity (targeted vs. non-targeted), the modified linguistic unit (words, characters, etc. ), and the attacked task.…”
Section: Supplementary Materialsmentioning
confidence: 99%
“…Moreover, such models are often lacking robustness and can easily be fooled by "adversarial examples" [14]: Drastic changes of a prediction may already be provoked by minor, actually unimportant changes of an object. This problem has not only been observed for images but also for other types of data, such as natural language text [17].…”
Section: Introductionmentioning
confidence: 97%
“…For example, in virtual adversarial training (VAT) (Miyato, Dai, and Goodfellow 2017) the perturbation is computed from unlabeled data to make the baseline DNN more robust against noise. Sato et al (2018) proposed an extension of VAT that generates a more interpretable perturbation. In addition, cross-view training (CVT) (Clark, Luong, and Le 2018) considers the auxiliary loss by making a prediction from an unlabeled input with a restricted view.…”
Section: Related Workmentioning
confidence: 99%
“…We trained the language model using the labeled training data and unlabeled data of each dataset. Several previous studies have adopted this network as a baseline (Miyato, Dai, and Goodfellow 2017;Sato et al 2018).…”
Section: Baseline Dnnsmentioning
confidence: 99%
See 1 more Smart Citation