Learning Variational Word Masks to Improve the Interpretability of Neural Text Classifiers

Chen, Hanjie; Ji, Yangfeng

doi:10.18653/v1/2020.emnlp-main.347

Cited by 31 publications

(20 citation statements)

References 31 publications

(24 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For each example, LIME approximates the local decision boundary by fitting a linear model over the samples obtained by perturbing the example. To measure the faithfulness of the local explanations obtained using LIME, we measure the area over perturbation curve (AOPC) (Samek et al, 2017;Nguyen, 2018;Chen and Ji, 2020) which is defined as:…”

Section: Results On Interpretabilitymentioning

confidence: 99%

Towards Improving Adversarial Training of NLP Models

Yoo¹,

Qi²

2021

Findings of the Association for Computational Linguistics: EMNLP 2021

View full text Add to dashboard Cite

Adversarial training, a method for learning robust deep neural networks, constructs adversarial examples during training. However, recent methods for generating NLP adversarial examples involve combinatorial search and expensive sentence encoders for constraining the generated instances. As a result, it remains challenging to use vanilla adversarial training to improve NLP models' performance, and the benefits are mainly uninvestigated. This paper proposes a simple and improved vanilla adversarial training process for NLP models, which we name Attacking to Training (A2T ). The core part of A2T is a new and cheaper word substitution attack optimized for vanilla adversarial training. We use A2T to train BERT and RoBERTa models on IMDB, Rotten Tomatoes, Yelp, and SNLI datasets. Our results empirically show that it is possible to train robust NLP models using a much cheaper adversary. We demonstrate that vanilla adversarial training with A2T can improve an NLP model's robustness to the attack it was originally trained with and also defend the model against other types of word substitution attacks. Furthermore, we show that A2T can improve NLP models' standard accuracy, cross-domain generalization, and interpretability. 1

show abstract

Section: Results On Interpretabilitymentioning

confidence: 99%

Towards Improving Adversarial Training of NLP Models

Yoo¹,

Qi²

2021

Findings of the Association for Computational Linguistics: EMNLP 2021

View full text Add to dashboard Cite

show abstract

“…Recently, there are applications and advances of local explanation methods [23,30,32]. For instance in NLP, some analyze the contributions of segments in documents to positive and negative sentiments [4,8,9,25]. Some move forwards to finding segments towards text similarity [10], retrieving a text span towards question-answering [27], and making local explanation as alignment model in machine translation [1].…”

Section: Related Work and Discussionmentioning

confidence: 99%

“…Traditionally, off-the-shelf local explanation frameworks, such as the Shapley value in game theory [32] and the learning-based Local Interpretable Model-agnostic Explanation (LIME) [30] have been shown to work well on classification tasks with a small number of classes. In particular, there has been work on image classification [30], sentiment analysis [8], and evidence selection for question answering [27]. However, to the best of our knowledge, there has been less work studying explanations over models with sequential output and large class sizes at each time step.…”

Section: Introductionmentioning

confidence: 99%

Local Explanation of Dialogue Response Generation

Tuan¹,

Pryor

Chen

et al. 2021

Preprint

View full text Add to dashboard Cite

In comparison to the interpretation of classification models, the explanation of sequence generation models is also an important problem, however it has seen little attention. In this work, we study model-agnostic explanations of a representative text generation task -dialogue response generation. Dialog response generation is challenging with its open-ended sentences and multiple acceptable responses. To gain insights into the reasoning process of a generation model, we propose a new method, local explanation of response generation (LERG) that regards the explanations as the mutual interaction of segments in input and output sentences. LERG views the sequence prediction as uncertainty estimation of a human response and then creates explanations by perturbing the input and calculating the certainty change over the human response. We show that LERG adheres to desired properties of explanations for text generation including unbiased approximation, consistency and cause identification. Empirically, our results show that our method consistently improves other widely used methods on proposed automatic-and human-evaluation metrics for this new task by 4.4-12.8%. Our analysis demonstrates that LERG can extract both explicit and implicit relations between input and output segments.Preprint. Under review.

show abstract

“…This paper takes a closer look into the gap between user need and current XAI. Specifically, we survey the common forms of explanations, such as feature attribution [6,26], decision rule [43,22], or probe [30,10], used in 218 recent NLP papers, and compare them to the 43 questions collected in the XAI Question Bank [28]. We use the forms of the explanations to gauge the misalignment between user questions and current NLP explanations.…”

Section: Introductionmentioning

confidence: 99%

“…Explainable AI Formats-I 1-Feature Attribution (FAT) [43.99%] : highlight the subsequences in input texts [6,26], Typical question [34]:…”

Section: Introductionmentioning

confidence: 99%

Explaining the Road Not Taken

Shen¹,

Huang²

2021

Preprint

View full text Add to dashboard Cite

It is unclear if existing interpretations of deep neural network models respond effectively to the needs of users. This paper summarizes the common forms of explanations (such as feature attribution, decision rules, or probes) used in over 200 recent papers about natural language processing (NLP), and compares them against user questions collected in the XAI Question Bank [28]. We found that although users are interested in explanations for the road not taken -namely, why the model chose one result and not a well-defined, seemly similar legitimate counterpart -most model interpretations cannot answer these questions.

show abstract

Learning Variational Word Masks to Improve the Interpretability of Neural Text Classifiers

Cited by 31 publications

References 31 publications

Towards Improving Adversarial Training of NLP Models

Towards Improving Adversarial Training of NLP Models

Local Explanation of Dialogue Response Generation

Explaining the Road Not Taken

Contact Info

Product

Resources

About