Thermostat: A Large Collection of NLP Model Explanations and Analysis Tools

Feldhus, Nils; Schwarzenberg, Robert; Möller, Sebastian

doi:10.18653/v1/2021.emnlp-demo.11

Cited by 7 publications

(11 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Through our experiments, we discover (i) a general lack of correlation between explanation methods, especially for more complex settings (i.e., transformer-based model and pair-sequence tasks) which is corroborated by additional recent research on text, tabular, and image data [36,37], (ii) that similar explanations do not always result in correlated rankings, and (iii) the existence of a single "ideal" explanation is questionable, which is a fundamental assumption of the agreement as evaluation paradigm. Without an external ground-truth explanation, all that rank correlation tells us is whether or not two rankings are similar.…”

Section: Discussionsupporting

confidence: 69%

“…In Section 5, we have shown that there is a low degree of correlation between explanation methods, especially for the transformer-based model. Similar conclusions are observed in the work of [36] and [37]. This makes it challenging to justify the expectation that in order for attention-based explanations to be valid, they should correlate with existing feature attribution methods.…”

Section: Lack Of Correlation Between Explanation Methodssupporting

confidence: 54%

“…This work builds off our prior ICML workshop paper [35], which several other papers have extended. Feldhus et al [36] introduce a software package to analyze instancewise explanations for popular NLP models and tasks and, in doing so, partially reproduce one of our experiments. Krishna et al [37] formally define and highlight the importance of the "disagreement problem" between feature attribution methods, which they find is a constant frustration for ML practitioners.…”

Section: Related Workmentioning

confidence: 83%

See 2 more Smart Citations

A Song of (Dis)agreement: Evaluating the Evaluation of Explainable Artificial Intelligence in Natural Language Processing

Neely

Bleeker

et al. 2022

HHAI2022: Augmenting Human Intellect

View full text Add to dashboard Cite

There has been significant debate in the NLP community about whether or not attention weights can be used as an explanation – a mechanism for interpreting how important each input token is for a particular prediction. The validity of “attention as explanation” has so far been evaluated by computing the rank correlation between attention-based explanations and existing feature attribution explanations using LSTM-based models. In our work, we (i) compare the rank correlation between five more recent feature attribution methods and two attention-based methods, on two types of NLP tasks, and (ii) extend this analysis to also include transformer-based models. We find that attention-based explanations do not correlate strongly with any recent feature attribution methods, regardless of the model or task. Furthermore, we find that none of the tested explanations correlate strongly with one another for the transformer-based model, leading us to question the underlying assumption that we should measure the validity of attention-based explanations based on how well they correlate with existing feature attribution explanation methods. After conducting experiments on five datasets using two different models, we argue that the community should stop using rank correlation as an evaluation metric for attention-based explanations. We suggest that researchers and practitioners should instead test various explanation methods and employ a human-in-the-loop process to determine if the explanations align with human intuition for the particular use case at hand.

show abstract

Section: Discussionsupporting

confidence: 69%

Section: Lack Of Correlation Between Explanation Methodssupporting

confidence: 54%

Section: Related Workmentioning

confidence: 83%

See 1 more Smart Citation

A Song of (Dis)agreement: Evaluating the Evaluation of Explainable Artificial Intelligence in Natural Language Processing

Neely

Bleeker

et al. 2022

HHAI2022: Augmenting Human Intellect

View full text Add to dashboard Cite

show abstract

“…In Section 5, we have shown that there is a low degree of correlation between explanation methods, especially for the transformer-based model. 10 Similar conclusions are observed in the work of [36] and [37]. This makes it challenging to justify the expectation that in order for attention-based explanations to be valid, they should correlate with existing feature attribution methods.…”

Section: Lack Of Correlation Between Explanation Methodsmentioning

confidence: 63%

A Song of (Dis)agreement: Evaluating the Evaluation of Explainable Artificial Intelligence in Natural Language Processing

Neely¹,

F.²,

Bleeker³

et al. 2022

Preprint

View full text Add to dashboard Cite

There has been significant debate in the NLP community about whether or not attention weights can be used as an explanation -a mechanism for interpreting how important each input token is for a particular prediction. The validity of "attention as explanation" has so far been evaluated by computing the rank correlation between attention-based explanations and existing feature attribution explanations using LSTM-based models. In our work, we (i) compare the rank correlation between five more recent feature attribution methods and two attention-based methods, on two types of NLP tasks, and (ii) extend this analysis to also include transformer-based models. We find that attention-based explanations do not correlate strongly with any recent feature attribution methods, regardless of the model or task. Furthermore, we find that none of the tested explanations correlate strongly with one another for the transformer-based model, leading us to question the underlying assumption that we should measure the validity of attention-based explanations based on how well they correlate with existing feature attribution explanation methods. After conducting experiments on five datasets using two different models, we argue that the community should stop using rank correlation as an evaluation metric for attention-based explanations. We suggest that researchers and practitioners should instead test various explanation methods and employ a human-in-theloop process to determine if the explanations align with human intuition for the particular use case at hand.

show abstract

“…Currently, ferret includes three classificationoriented datasets annotated with human rationales, i.e., annotations highlighting the most relevant words, phrases, or sentences a human annotator attributed to a given class label (DeYoung et al, 2020;Wiegreffe and Marasovic, 2021). Moreover, ferret API gives access to the Thermostat collection (Feldhus et al, 2021), a wide set of pre-computed feature attribution scores.…”

Section: Dataset Apimentioning

confidence: 99%

ferret: a Framework for Benchmarking Explainers on Transformers

Attanasio,

Pastor,

Di Bonaventura

et al. 2023

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrati

View full text Add to dashboard Cite

As Transformers are increasingly relied upon to solve complex NLP problems, there is an increased need for their decisions to be humanly interpretable. While several explainable AI (XAI) techniques for interpreting the outputs of transformer-based models have been proposed, there is still a lack of easy access to using and comparing them. We introduce ferret, a Python library to simplify the use and comparisons of XAI methods on transformer-based classifiers. With ferret, users can visualize and compare transformers-based models output explanations using state-of-the-art XAI methods on any freetext or existing XAI corpora. Moreover, users can also evaluate ad-hoc XAI metrics to select the most faithful and plausible explanations. To align with the recently consolidated process of sharing and using transformers-based models from Hugging Face, ferret interfaces directly with its Python library. In this paper, we showcase ferret to benchmark XAI methods used on transformers for sentiment analysis and hate speech detection. We show how specific methods provide consistently better explanations and are preferable in the context of transformer models.

show abstract

Thermostat: A Large Collection of NLP Model Explanations and Analysis Tools

Cited by 7 publications

References 33 publications

A Song of (Dis)agreement: Evaluating the Evaluation of Explainable Artificial Intelligence in Natural Language Processing

A Song of (Dis)agreement: Evaluating the Evaluation of Explainable Artificial Intelligence in Natural Language Processing

A Song of (Dis)agreement: Evaluating the Evaluation of Explainable Artificial Intelligence in Natural Language Processing

ferret: a Framework for Benchmarking Explainers on Transformers

Contact Info

Product

Resources

About