Towards robust explanations for deep neural networks

Dombrowski, Ann-Kathrin; Anders, Christopher J.; Müller, Klaus‐Robert; Kessel, Pan

doi:10.1016/j.patcog.2021.108194

Cited by 42 publications

(63 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Instead of the current approach to debias models with data balancing, our debias approach can retrain models to de-emphasize focusing on sensitive concepts (e.g., faces). However, we caution about the dark pattern of debiasing explanations to make an unfair model appear fair by retraining its explanation to appear fair (e.g., [24,25]).…”

Section: Debiasing Explanations Against Social Biasmentioning

confidence: 99%

Debiased-CAM to mitigate systematic error with faithful visual explanations of machine learning

Zhang¹,

Dimiccoli²,

Lim³

2022

Preprint

View full text Add to dashboard Cite

Model explanations such as saliency maps can improve user trust in AI by highlighting important features for a prediction. However, these become distorted and misleading when explaining predictions of images that are subject to systematic error (bias). Furthermore, the distortions persist despite model fine-tuning on images biased by different factors (blur, color temperature, day/night). We present Debiased-CAM to recover explanation faithfulness across various bias types and levels by training a multi-input, multi-task model with auxiliary tasks for explanation and bias level predictions. In simulation studies, the approach not only enhanced prediction accuracy, but also generated highly faithful explanations about these predictions as if the images were unbiased. In user studies, debiased explanations improved user task performance, perceived truthfulness and perceived helpfulness. Debiased training can provide a versatile platform for robust performance and explanation faithfulness for a wide range of applications with data biases.

show abstract

Section: Debiasing Explanations Against Social Biasmentioning

confidence: 99%

Debiased-CAM to mitigate systematic error with faithful visual explanations of machine learning

Zhang¹,

Dimiccoli²,

Lim³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…In the context of feature importance methods, [3,11] have proposed approaches to make gradient-based methods for DNNs significantly more robust. Anders et al [3] took inspiration from the field of manifold learning and proposed to project explanations along tangential directions of the data manifold.…”

Section: Robust Explanationsmentioning

confidence: 99%

“…Anders et al [3] took inspiration from the field of manifold learning and proposed to project explanations along tangential directions of the data manifold. Dombrowski et al [11] proposed three ways to improve the robustness of DNN explanations -(1) by training DNNs with weight decay; (2) by training using smoothed activation functions; and (3) by adding a regulariser for model's curvature in the training process. Similarly, Lakkaraju Table 1: A summary of robustness analysis scenarios for two types of post-hoc local explainability methods -feature importance and counterfactuals.…”

Section: Robust Explanationsmentioning

confidence: 99%

A Survey on the Robustness of Feature Importance and Counterfactual Explanations

Mishra¹,

Dutta²,

Long³

et al. 2021

Preprint

View full text Add to dashboard Cite

There exist several methods that aim to address the crucial task of understanding the behaviour of AI/ML models. Arguably, the most popular among them are local explanations that focus on investigating model behaviour for individual instances. Several methods have been proposed for local analysis, but relatively lesser effort has gone into understanding if the explanations are robust and accurately reflect the behaviour of underlying models. In this work, we present a survey of the works that analysed the robustness of two classes of local explanations (feature importance and counterfactual explanations) that are popularly used in analysing AI/ML models in finance. The survey aims to unify existing definitions of robustness, introduces a taxonomy to classify different robustness approaches, and discusses some interesting results. Finally, the survey introduces some pointers about extending current robustness analysis approaches so as to identify reliable explainability methods.

show abstract

“…We consider such dependencies in stage 1, where the constraints in Eq. ( 5) encourage the connectivities, (10) where M j ∈ [0, 1] is the mask for the j-th edge. The constraints indicate that the selection of the j-th edge can lead to the selection of the k-th edge if they share a node [29], and controls the co-occurrence of the two edges.…”

Section: Optimization Problems For Graphsmentioning

confidence: 99%

“…Robustness in explanations is gaining attention [10], [22], [46]. In [10], the goal is to train neural networks for image classification that has robust explanations with malicious data manipulations.…”

Section: H Reproducibility Checklistmentioning

confidence: 99%

Self-learn to Explain Siamese Networks Robustly

Chen¹,

Shen²,

Ma³

et al. 2021

Preprint

View full text Add to dashboard Cite

Learning to compare two objects are essential in applications, such as digital forensics, face recognition, and brain network analysis, especially when labeled data are scarce and imbalanced. As these applications make high-stake decisions and involve societal values like fairness and transparency, it is critical to explain the learned models. We aim to study post-hoc explanations of Siamese networks (SN) widely used in learning to compare. We characterize the instability of gradientbased explanations due to the additional compared object in SN, in contrast to architectures with a single input instance. We propose an optimization framework that derives global invariance from unlabeled data using self-learning to promote the stability of local explanations tailored for specific query-reference pairs. The optimization problems can be solved using gradient descent-ascent (GDA) for constrained optimization, or SGD for KL-divergence regularized unconstrained optimization, with convergence proofs, especially when the objective functions are nonconvex due to the Siamese architecture. Quantitative results and case studies on tabular and graph data from neuroscience and chemical engineering show that the framework respects the self-learned invariance while robustly optimizing the faithfulness and simplicity of the explanation. We further demonstrate the convergence of GDA experimentally.

show abstract

Towards robust explanations for deep neural networks

Cited by 42 publications

References 16 publications

Debiased-CAM to mitigate systematic error with faithful visual explanations of machine learning

Debiased-CAM to mitigate systematic error with faithful visual explanations of machine learning

A Survey on the Robustness of Feature Importance and Counterfactual Explanations

Self-learn to Explain Siamese Networks Robustly

Contact Info

Product

Resources

About