The (Un)reliability of saliency methods

Kindermans, Pieter-Jan; Hooker, Sara; Adebayo, Julius; Alber, Maximilian; Schütt, Kristof T.; Dähne, Sven; Erhan, Dumitru; Kim, Been

doi:10.48550/arxiv.1711.00867

Cited by 69 publications

(64 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Robustness to input perturbations: Within this category the majority of works focused on analysing the robustness of gradientbased saliency maps that are specific to analysing neural network models (differentiable models). For example, Kindermans et al [18] demonstrated that perturbing inputs by simply adding a constant shift causes several gradient-based saliency methods to attribute incorrectly. Others designed novel objective functions to demonstrate that most of the popular saliency methods can be forced to generate arbitrary explanations and attributed this to certain geometrical properties of neural networks (e.g., shape of decision boundary) [10,12].…”

Section: Feature Importance Methodsmentioning

confidence: 99%

“…• Robustness to input perturbations: This scenario involves keeping the machine learning model unchanged and analysing the behaviour of explainability methods to slight perturbations to model inputs [2,10,12,18,30]. Such input perturbations could be introduced deliberately by an adversary or could result from changes in data distribution.…”

Section: Taxonomy Of Robustness Analysismentioning

confidence: 99%

“…Explanation Methods Robustness Scenario Models Data Types type m-agnostic m-dependent [18] feature importance no yes input perturbation NN images [2] feature importance yes yes input perturbation RF, NN tabular, images [12] feature importance no yes input perturbation NN images [10] feature importance no yes input perturbation NN images [1] feature importance no yes model manipulation NN images [15] feature importance no yes model manipulation NN images [3] feature importance no yes model manipulation LR, NN tabular, images [29] feature importance yes no model manipulation RF tabular [9] feature importance yes yes model manipulation NN tabular [38] feature importance yes no hyperparameters selection RF, GBT, MN-NB tabular, text [5] feature importance yes yes hyperparameters selection NN images [22] feature Ensuring robustness in counterfactual explanations is an interesting and challenging problem space. As we mentioned before, changes to the model could happen due to a variety of reasons.…”

Section: Referencementioning

confidence: 99%

See 2 more Smart Citations

A Survey on the Robustness of Feature Importance and Counterfactual Explanations

Mishra¹,

Dutta²,

Long³

et al. 2021

Preprint

View full text Add to dashboard Cite

There exist several methods that aim to address the crucial task of understanding the behaviour of AI/ML models. Arguably, the most popular among them are local explanations that focus on investigating model behaviour for individual instances. Several methods have been proposed for local analysis, but relatively lesser effort has gone into understanding if the explanations are robust and accurately reflect the behaviour of underlying models. In this work, we present a survey of the works that analysed the robustness of two classes of local explanations (feature importance and counterfactual explanations) that are popularly used in analysing AI/ML models in finance. The survey aims to unify existing definitions of robustness, introduces a taxonomy to classify different robustness approaches, and discusses some interesting results. Finally, the survey introduces some pointers about extending current robustness analysis approaches so as to identify reliable explainability methods.

show abstract

Section: Feature Importance Methodsmentioning

confidence: 99%

Section: Taxonomy Of Robustness Analysismentioning

confidence: 99%

Section: Referencementioning

confidence: 99%

See 1 more Smart Citation

A Survey on the Robustness of Feature Importance and Counterfactual Explanations

Mishra¹,

Dutta²,

Long³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Other papers bring up criticism to some of the methods we just described. In [14], it is argued that saliency methods lack reliability when the explanation is sensitive to factors that do not contribute to the model prediction, and in [15] it is shown that DeConvNets and Guided Backpropagation do not produce the theoretically correct explanations for a linear model, and so even less for a multi-layer network with millions of parameters. Finally, in [9] and [18], the authors propose that neurons do not encode single concepts and that they are in fact multifaceted, with some concepts being encoded by a group of neurons rather than by a sole neuron by itself.…”

Section: Related Work On Interpretability and Explainability Of Neura...mentioning

confidence: 99%

PCACE: A Statistical Approach to Ranking Neurons for CNN Interpretability

Casacuberta¹,

Flaxman²

2021

Preprint

View full text Add to dashboard Cite

In this paper we introduce a new problem within the growing literature of interpretability for convolution neural networks (CNNs). While previous work has focused on the question of how to visually interpret CNNs, we ask what it is that we care to interpret, that is, which layers and neurons are worth our attention? Due to the vast size of modern deep learning network architectures, automated, quantitative methods are needed to rank the relative importance of neurons so as to provide an answer to this question. We present a new statistical method for ranking the hidden neurons in any convolutional layer of a network. We define importance as the maximal correlation between the activation maps and the class score. We provide different ways in which this method can be used for visualization purposes with MNIST and ImageNet, and show a real-world application of our method to air pollution prediction with street-level images.

show abstract

“…As discussed earlier, gradient based reconstruction methods might not be ideal for explaining a CNN's reasoning process [17]. Here however, we only use it to focus the reconstruction on salient regions of the agent and do not use it to explain the agent's behavior for which these methods are ideally suited.…”

Section: State Modelmentioning

confidence: 99%

Finding and Visualizing Weaknesses of Deep Reinforcement Learning Agents

Rupprecht¹,

Ibrahim²,

Pal³

2019

Preprint

View full text Add to dashboard Cite

As deep reinforcement learning driven by visual perception becomes more widely used there is a growing need to better understand and probe the learned agents. Understanding the decision making process and its relationship to visual inputs can be very valuable to identify problems in learned behavior. However, this topic has been relatively under-explored in the research community. In this work we present a method for synthesizing visual inputs of interest for a trained agent. Such inputs or states could be situations in which specific actions are necessary. Further, critical states in which a very high or a very low reward can be achieved are often interesting to understand the situational awareness of the system as they can correspond to risky states. To this end, we learn a generative model over the state space of the environment and use its latent space to optimize a target function for the state of interest. In our experiments we show that this method can generate insights for a variety of environments and reinforcement learning methods. We explore results in the standard Atari benchmark games as well as in an autonomous driving simulator. Based on the efficiency with which we have been able to identify behavioural weaknesses with this technique, we believe this general approach could serve as an important tool for AI safety applications.

show abstract

The (Un)reliability of saliency methods

Cited by 69 publications

References 0 publications

A Survey on the Robustness of Feature Importance and Counterfactual Explanations

A Survey on the Robustness of Feature Importance and Counterfactual Explanations

PCACE: A Statistical Approach to Ranking Neurons for CNN Interpretability

Finding and Visualizing Weaknesses of Deep Reinforcement Learning Agents

Contact Info

Product

Resources

About