Evaluating Recurrent Neural Network Explanations

Arras, Leila; Osman, Ahmed; Müller, Klaus‐Robert; Samek, Wojciech

doi:10.18653/v1/w19-4813

Cited by 57 publications

(50 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A recent study [8] proposes to objectively evaluate explanation for sequential data using ground truth information in a toy task. The idea of this evaluation metric is to add or subtract two numbers within an input sequence and measure the correlation between the relevances assigned to the elements of the sequence and the two input numbers.…”

Section: Evaluating Quality Of Explanationsmentioning

confidence: 99%

Towards Explainable Artificial Intelligence

Samek

Müller

2019

Lecture Notes in Computer Science

Self Cite

546

357

View full text Add to dashboard Cite

In recent years, machine learning (ML) has become a key enabling technology for the sciences and industry. Especially through improvements in methodology, the availability of large databases and increased computational power, today's ML algorithms are able to achieve excellent performance (at times even exceeding the human level) on an increasing number of complex tasks. Deep learning models are at the forefront of this development. However, due to their nested non-linear structure, these powerful models have been generally considered "black boxes", not providing any information about what exactly makes them arrive at their predictions. Since in many applications, e.g., in the medical domain, such lack of transparency may be not acceptable, the development of methods for visualizing, explaining and interpreting deep learning models has recently attracted increasing attention. This introductory paper presents recent developments and applications in this field and makes a plea for a wider use of explainable learning algorithms in practice.

show abstract

Section: Evaluating Quality Of Explanationsmentioning

confidence: 99%

Towards Explainable Artificial Intelligence

Samek

Müller

2019

Lecture Notes in Computer Science

Self Cite

546

357

View full text Add to dashboard Cite

show abstract

“…In this section we discuss various input saliency methods for NLP as alternatives to attention: gradient-based ( §3.1), propagation-based ( §3.2), and occlusion-based methods ( §3.3), following Arras et al (2019). We do not endorse any specific method 1 , but rather try to give an overview of methods and how they differ.…”

Section: Saliency Methodsmentioning

confidence: 99%

“…Relevance is redistributed until we arrive at the input layers. While LRP requires implementing a custom backward pass, it does allow precise control to preserve relevance, and it has been shown to work better than using gradient-based methods on text classification (Arras et al, 2019).…”

Section: Propagation-based Methodsmentioning

confidence: 99%

See 1 more Smart Citation

The elephant in the interpretability room: Why use attention as explanation when we have saliency methods?

Bastings

Filippova

2020

Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

109

View full text Add to dashboard Cite

There is a recent surge of interest in using attention as explanation of model predictions, with mixed evidence on whether attention can be used as such. While attention conveniently gives us one weight per input token and is easily extracted, it is often unclear toward what goal it is used as explanation. We find that often that goal, whether explicitly stated or not, is to find out what input tokens are the most relevant to a prediction, and that the implied user for the explanation is a model developer. For this goal and user, we argue that input saliency methods are better suited, and that there are no compelling reasons to use attention, despite the coincidence that it provides a weight for each input. With this position paper, we hope to shift some of the recent focus on attention to saliency methods, and for authors to clearly state the goal and user for their explanations.

show abstract

“…For example, Zhang et al (2018) propose the pointing game task, in which the highest-relevance pixel for an image classifier input must belong to the object described by the target output class. Within this framework, ), Poerner et al (2018, Arras et al (2019), and Yang and Kim (2019) construct datasets in which input features exhibit experimentally controlled notions of importance, yielding "ground truth" attributions against which heatmaps can be evaluated.…”

Section: Related Workmentioning

confidence: 99%

Evaluating Attribution Methods using White-Box LSTMs

Hao

2020

Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

View full text Add to dashboard Cite

Interpretability methods for neural networks are difficult to evaluate because we do not understand the black-box models typically used to test them. This paper proposes a framework in which interpretability methods are evaluated using manually constructed networks, which we call white-box networks, whose behavior is understood a priori. We evaluate five methods for producing attribution heatmaps by applying them to white-box LSTM classifiers for tasks based on formal languages. Although our white-box classifiers solve their tasks perfectly and transparently, we find that all five attribution methods fail to produce the expected model explanations.

show abstract

Evaluating Recurrent Neural Network Explanations

Cited by 57 publications

References 43 publications

Towards Explainable Artificial Intelligence

Towards Explainable Artificial Intelligence

The elephant in the interpretability room: Why use attention as explanation when we have saliency methods?

Evaluating Attribution Methods using White-Box LSTMs

Contact Info

Product

Resources

About