Effective Attention Sheds Light On Interpretability

Sun, Kaiser; Marasovi, Ana

doi:10.18653/v1/2021.findings-acl.361

Cited by 7 publications

(7 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Attention matrix. The attention matrix as an explanation of individual predictions has been extensively studied in [34,18,37,32]. Although these works have shown through a set of experiments that the correlation between learned attention weights and feature importance is weak, we visualize the attention weights of each head in the Transformer to compare the results with LRP.…”

Section: Methods Lrpmentioning

confidence: 99%

Understanding microbiome dynamics via interpretable graph representation learning

Melnyk¹,

Weimann²,

Conrad³

2022

Preprint

View full text Add to dashboard Cite

This information can help in differentiating the microbiome profile of healthy and diseased individuals. Contribution.Overall, this paper makes the following contributions:• We present a model that learns a low-dimensional representation of the time-evolving graph in an unsupervised manner. Through our experiments, we demonstrate that the metastability governing the timeevolving graph is preserved in the new space.• We apply our method to real-world microbiome data to simplify the analysis of microbiome dynamics. This enables us to differentiate healthy and diseased states.• We also analyse the low-dimensional representation of the time-evolving graph throughout visualization and comparative analysis on the clustering task, where we show that our model outperforms other unsupervised graph representation learning methods.

show abstract

Section: Methods Lrpmentioning

confidence: 99%

Understanding microbiome dynamics via interpretable graph representation learning

Melnyk¹,

Weimann²,

Conrad³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Our benchmarking study provides a perfect test-bed to understand if attention aligns with attribution methods. We compare standard self-attention with effective attention (Brunner et al, 2020;Sun and Marasović, 2021). Further, we measure attribution between input tokens and hidden representations using Hidden Token Attribution (HTA) (Brunner et al, 2020).…”

Section: See Appendix A2 For All Implementation Detailsmentioning

confidence: 99%

Benchmarking Post-Hoc Interpretability Approaches for Transformer-based Misogyny Detection

Attanasio¹,

Nozza²,

Pastor³

et al. 2022

Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP

View full text Add to dashboard Cite

Warning: This paper contains examples of language that some people may find offensive.Transformer-based Natural Language Processing models have become the standard for hate speech detection. However, the unconscious use of these techniques for such a critical task comes with negative consequences. Various works have demonstrated that hate speech classifiers are biased. These findings have prompted efforts to explain classifiers, mainly using attribution methods. In this paper, we provide the first benchmark study of interpretability approaches for hate speech detection. We cover four post-hoc token attribution approaches to explain the predictions of Transformer-based misogyny classifiers in English and Italian. Further, we compare generated attributions to attention analysis. We find that only two algorithms provide faithful explanations aligned with human expectations. Gradient-based methods and attention, however, show inconsistent outputs, making their value for explanations questionable for hate speech detection tasks.

show abstract

“…However, in the Transformer architecture (Vaswani et al, 2017), it has become a means to account for lexical influence and long-range dependencies. It also provides useful information about the importance of a term for the output (Wiegreffe and Pinter, 2019;Brunner et al, 2020;Sun and Marasović, 2021). Here, we use the notion of attention entropy, and EAR's use of it in BERT.…”

Section: Entropy-based Attention Regularizationmentioning

confidence: 99%

Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists

Attanasio¹,

Nozza²,

Hovy³

et al. 2022

Preprint

View full text Add to dashboard Cite

Warning: This paper contains examples of language that some people may find offensive.Natural Language Processing (NLP) models risk overfitting to specific terms in the training data, thereby reducing their performance, fairness, and generalizability. E.g., neural hate speech detection models are strongly influenced by identity terms like gay, or women, resulting in false positives, severe unintended bias, and lower performance. Most mitigation techniques use lists of identity terms or samples from the target domain during training. However, this approach requires a-priori knowledge and introduces further bias if important terms are neglected. Instead, we propose a knowledge-free Entropy-based Attention Regularization (EAR) to discourage overfitting to training-specific terms. An additional objective function penalizes tokens with low self-attention entropy. We fine-tune BERT via EAR: the resulting model matches or exceeds state-of-the-art performance for hate speech classification and bias metrics on three benchmark corpora in English and Italian. EAR also reveals overfitting terms, i.e., terms most likely to induce bias, to help identify their effect on the model, task, and predictions.

show abstract

Effective Attention Sheds Light On Interpretability

Cited by 7 publications

References 18 publications

Understanding microbiome dynamics via interpretable graph representation learning

Understanding microbiome dynamics via interpretable graph representation learning

Benchmarking Post-Hoc Interpretability Approaches for Transformer-based Misogyny Detection

Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists

Contact Info

Product

Resources

About