Human-grounded Evaluations of Explanation Methods for Text Classification

Lertvittayakumjorn, Piyawat; Toni, Francesca

doi:10.18653/v1/d19-1523

Cited by 49 publications

(33 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another group of studies performs human evaluation of the outputs of explainability methods (Lertvittayakumjorn and Toni, 2019;Narayanan et al, 2018). Such studies exhibit low interannotator agreement and reflect mostly what appears to be reasonable and appealing to the annotators, not the actual properties of the method.…”

Section: Related Workmentioning

confidence: 99%

“…Existing studies for evaluating explainability heavily differ in their scope. Some concentrate on a single model architecture -BERT-LSTM (DeYoung et al, 2020), RNN (Arras et al, 2019), CNN (Lertvittayakumjorn and Toni, 2019), whereas a few consider more than one model (Guan et al, 2019;Poerner et al, 2018). Some studies concentrate on one particular dataset (Guan et al, 2019;Arras et al, 2019), while only a few generalize their findings over downstream tasks (DeYoung et al, 2020;Vashishth et al, 2019).…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

A Diagnostic Study of Explainability Techniques for Text Classification

Atanasova¹,

Simonsen²,

Lioma³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

124

133

View full text Add to dashboard Cite

Recent developments in machine learning have introduced models that approach human performance at the cost of increased architectural complexity. Efforts to make the rationales behind the models' predictions transparent have inspired an abundance of new explainability techniques. Provided with an already trained model, they compute saliency scores for the words of an input instance. However, there exists no definitive guide on (i) how to choose such a technique given a particular application task and model architecture, and (ii) the benefits and drawbacks of using each such technique. In this paper, we develop a comprehensive list of diagnostic properties for evaluating existing explainability techniques. We then employ the proposed list to compare a set of diverse explainability techniques on downstream text classification tasks and neural network architectures. We also compare the saliency scores assigned by the explainability techniques with human annotations of salient input regions to find relations between a model's performance and the agreement of its rationales with human ones. Overall, we find that the gradient-based explanations perform best across tasks and model architectures, and we present further insights into the properties of the reviewed explainability techniques.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

A Diagnostic Study of Explainability Techniques for Text Classification

Atanasova¹,

Simonsen²,

Lioma³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

124

133

View full text Add to dashboard Cite

show abstract

“…Lipton (2018); Doshi-Velez and Kim (2017) and Rudin (2019) provide overviews on definitions and characterizations of interpretability. Lertvittayakumjorn and Toni (2019) classify three possible uses of text explanations: (i) revealing model behavior, (ii) justifying model predictions, and (iii) helping humans investigate uncertain predictions. Attempting to guarantee the faithfulness of a feature selection or explanation generation method is a more challenging question than finding explanations which humans find acceptable (Rudin, 2019).…”

Section: Related Workmentioning

confidence: 99%

Learning to Faithfully Rationalize by Construction

Jain

Wiegreffe

Pinter

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

115

View full text Add to dashboard Cite

In many settings it is important for one to be able to understand why a model made a particular prediction. In NLP this often entails extracting snippets of an input text 'responsible for' corresponding model output; when such a snippet comprises tokens that indeed informed the model's prediction, it is a faithful explanation. In some settings, faithfulness may be critical to ensure transparency. Lei et al. (2016) proposed a model to produce faithful rationales for neural text classification by defining independent snippet extraction and prediction modules. However, the discrete selection over input tokens performed by this method complicates training, leading to high variance and requiring careful hyperparameter tuning. We propose a simpler variant of this approach that provides faithful explanations by construction. In our scheme, named FRESH, arbitrary feature importance scores (e.g., gradients from a trained model) are used to induce binary labels over token inputs, which an extractor can be trained to predict. An independent classifier module is then trained exclusively on snippets provided by the extractor; these snippets thus constitute faithful explanations, even if the classifier is arbitrarily complex. In both automatic and manual evaluations we find that variants of this simple framework yield predictive performance superior to 'end-to-end' approaches, while being more general and easier to train.

show abstract

“…One of the advantages of GrASP lite is that it is an explainable model, making predictions based on rich and interpretable rules. These can be used to justify predictions, sometimes termed a local explanation (Lertvittayakumjorn and Toni, 2019) and also to understand the way the model works as a whole (termed global explanation), potentially enabling experts to build better classifiers.…”

Section: User Studymentioning

confidence: 99%

Unsupervised Expressive Rules Provide Explainability and Assist Human Experts Grasping New Domains

Shnarch

Choshen

Moshkowich

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Approaching new data can be quite deterrent; you do not know how your categories of interest are realized in it, commonly, there is no labeled data at hand, and the performance of domain adaptation methods is unsatisfactory.Aiming to assist domain experts in their first steps into a new task over a new corpus, we present an unsupervised approach to reveal complex rules which cluster the unexplored corpus by its prominent categories (or facets).These rules are human-readable, thus providing an important ingredient which has become in short supply lately -explainability. Each rule provides an explanation for the commonality of all the texts it clusters together.We present an extensive evaluation of the usefulness of these rules in identifying target categories, as well as a user study which assesses their interpretability.

show abstract

Human-grounded Evaluations of Explanation Methods for Text Classification

Cited by 49 publications

References 30 publications

A Diagnostic Study of Explainability Techniques for Text Classification

A Diagnostic Study of Explainability Techniques for Text Classification

Learning to Faithfully Rationalize by Construction

Unsupervised Expressive Rules Provide Explainability and Assist Human Experts Grasping New Domains

Contact Info

Product

Resources

About