How do Decisions Emerge across Layers in Neural Models? Interpretation with Differentiable Masking

Cao, Nicola De; Schlichtkrull, Michael Sejr; Aziz, Wilker; Titov, Ivan

doi:10.18653/v1/2020.emnlp-main.262

Cited by 38 publications

(39 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Previous work on learning individual word masks only focuses on the first two properties De Cao et al, 2020). To satisfy the third property, We propose GMASK to implicitly detect word correlations and distribute the correlated words into a group (e.g.…”

Section: Explaining Models With Word Masksmentioning

confidence: 99%

Explaining Neural Network Predictions on Sentence Pairs via Learning Word-Group Masks

Chen¹,

Feng²,

Ganhotra³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

Explaining neural network models is important for increasing their trustworthiness in realworld applications. Most existing methods generate post-hoc explanations for neural network models by identifying individual feature attributions or detecting interactions between adjacent features. However, for models with text pairs as inputs (e.g., paraphrase identification), existing methods are not sufficient to capture feature interactions between two texts and their simple extension of computing all word-pair interactions between two texts is computationally inefficient. In this work, we propose the Group Mask (GMASK) method to implicitly detect word correlations by grouping correlated words from the input text pair together and measure their contribution to the corresponding NLP tasks as a whole. The proposed method is evaluated with two different model architectures (decomposable attention model and BERT) across four datasets, including natural language inference and paraphrase identification tasks. Experiments show the effectiveness of GMASK in providing faithful explanations to these models 1 .

show abstract

Section: Explaining Models With Word Masksmentioning

confidence: 99%

Explaining Neural Network Predictions on Sentence Pairs via Learning Word-Group Masks

Chen¹,

Feng²,

Ganhotra³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

show abstract

“…where M l (P, H) and M Z l (P, H) ∈ R 3 are original output logits and output logits when applying the mask Z respectively. Compared to commonly used KL divergence (De Cao et al, 2020) or label equality (Feng et al, 2018), the euclidean distance between logits is a stricter constraint that narrows down the solution space and would lead to more faithful explanations 3 .…”

Section: Problem Formationmentioning

confidence: 99%

“…We select feature attribution baselines including co-attention itself, perturbation-based approaches LEAVEONEOUT (Li et al, 2016), LIME (Ribeiro et al, 2016), BACKSELECT (Carter et al, 2019), gradient-based approaches GRADIENT (Simonyan et al, 2014) and INTEGRATGRAD (Sundararajan et al, 2017) and a feature selection method DIFF-MASK (De Cao et al, 2020). The original DIFF-MASK is applied on text level, we derive an alignment variant for comparison in Appendix C.…”

Section: Baselinesmentioning

confidence: 99%

“…Towards interpretability, explaining the model behavior has gained increasing attention. Lots of approaches are based on feature attribution which 1 Our code is available at https://github.com/ changmenseng/arec assigns saliency scores for input features (Bahdanau et al, 2015;Lundberg and Lee, 2017;Thorne et al, 2019;Kim et al, 2020), and feature selection or rationale that keeps a subset of features sufficient for the prediction (Lei et al, 2016;Bastings et al, 2019;De Cao et al, 2020;DeYoung et al, 2020). Figure 1 (a) and (b) present a text attribution explanation by LIME (Ribeiro et al, 2016) and a text rationale explanation from Li et al (2016) of an NLI sentence pair.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Alignment Rationale for Natural Language Inference

Jiang¹,

Zhang²,

Zhao³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Deep learning models have achieved great success on the task of Natural Language Inference (NLI), though only a few attempts try to explain their behaviors. Existing explanation methods usually pick prominent features such as words or phrases from the input text. However, for NLI, alignments among words or phrases are more enlightening clues to explain the model. To this end, this paper presents AREC, a post-hoc approach to generate alignment rationale explanations for co-attention based models in NLI. The explanation is based on feature selection, which keeps few but sufficient alignments while maintaining the same prediction of the target model. Experimental results show that our method is more faithful and readable compared with many existing approaches. We further study and reevaluate three typical models through our explanation beyond accuracy, and propose a simple method that greatly improves the model robustness. 1

show abstract

“…Another implementation for making word masks sparse is by adding L 0 regularization (Lei et al, 2016;Bastings et al, 2019;De Cao et al, 2020), while in the objective Equation 8, we regularize masks with a predefined prior distribution p 0 (R) as described in subsection 3.4.…”

Section: Connectionsmentioning

confidence: 99%

Learning Variational Word Masks to Improve the Interpretability of Neural Text Classifiers

Chen¹,

Ji²

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

To build an interpretable neural text classifier, most of the prior work has focused on designing inherently interpretable models or finding faithful explanations. A new line of work on improving model interpretability has just started, and many existing methods require either prior information or human annotations as additional inputs in training. To address this limitation, we propose the variational word mask (VMASK) method to automatically learn task-specific important words and reduce irrelevant information on classification, which ultimately improves the interpretability of model predictions. The proposed method is evaluated with three neural text classifiers (CNN, LSTM, and BERT) on seven benchmark text classification datasets. Experiments show the effectiveness of VMASK in improving both model prediction accuracy and interpretability.

show abstract

How do Decisions Emerge across Layers in Neural Models? Interpretation with Differentiable Masking

Cited by 38 publications

References 35 publications

Explaining Neural Network Predictions on Sentence Pairs via Learning Word-Group Masks

Explaining Neural Network Predictions on Sentence Pairs via Learning Word-Group Masks

Alignment Rationale for Natural Language Inference

Learning Variational Word Masks to Improve the Interpretability of Neural Text Classifiers

Contact Info

Product

Resources

About