Explaining Deep Learning Models with Constrained Adversarial Examples

Moore, Jonathan; Hammerla, Nils Y.; Watkins, Chris

doi:10.1007/978-3-030-29908-8_4

Cited by 29 publications

(33 citation statements)

References 8 publications

(8 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The researchers suggest the use of the Manhattan distance weighted by the inverse median absolute deviation to calculate the proximity of a counterfactual to the input data example. Another case of counterfactual explanation generation regarded as an optimization problem is the ''Constrained Adversarial Examples'' framework [148]. Adversarial examples that could serve as the basis for the counterfactual explanation of the output of deep learning models are searched for with the aim of minimizing the loss with respect to the attributes (features) between the original and counterfactual data examples.…”

Section: ) Explainability Methodsmentioning

confidence: 99%

“…Indeed, contfactuals are particularly suitable for informing the end-user why a given data example is assigned a particular class label. Thus, the outlined classification-oriented frameworks are evaluated on classifiers based on logistic regression [55], [136], [153], [158], decision trees [46], [80], [122], [140], [150], [155], [159], gradient boosted decision trees [147], support vector machines [131], [138], [146], random forests [81], [86], [142]- [144], neural networks [6], [48], [49], [91], [129], [130], [133], [135], [139], [141], [145], [148], [151], or combinations of these [100], [105], [134], [152], [154], [160]. In three studies [67], [128], [137], the classifiers used in the experiments are not specified.…”

Section: ) Ai Problemmentioning

confidence: 99%

“…[81], [105], [136], [147], [148], [151], [154], [158], [160]. They can be extracted from interpretable featurevalue pairs as a result of pruning in the search space [132].…”

Section: ) Output Representationmentioning

confidence: 99%

See 2 more Smart Citations

A Survey of Contrastive and Counterfactual Explanation Generation Methods for Explainable Artificial Intelligence

et al. 2021

View full text Add to dashboard Cite

A number of algorithms in the field of artificial intelligence offer poorly interpretable decisions. To disclose the reasoning behind such algorithms, their output can be explained by means of so-called evidence-based (or factual) explanations. Alternatively, contrastive and counterfactual explanations justify why the output of the algorithms is not any different and how it could be changed, respectively. It is of crucial importance to bridge the gap between theoretical approaches to contrastive and counterfactual explanation and the corresponding computational frameworks. In this work we conduct a systematic literature review which provides readers with a thorough and reproducible analysis of the interdisciplinary research field under study. We first examine theoretical foundations of contrastive and counterfactual accounts of explanation. Then, we report the state-of-the-art computational frameworks for contrastive and counterfactual explanation generation. In addition, we analyze how grounded such frameworks are on the insights from the inspected theoretical approaches. As a result, we highlight a variety of properties of the approaches under study and reveal a number of shortcomings thereof. Moreover, we define a taxonomy regarding both theoretical and practical approaches to contrastive and counterfactual explanation.

show abstract

Section: ) Explainability Methodsmentioning

confidence: 99%

Section: ) Ai Problemmentioning

confidence: 99%

See 1 more Smart Citation

A Survey of Contrastive and Counterfactual Explanation Generation Methods for Explainable Artificial Intelligence

et al. 2021

View full text Add to dashboard Cite

show abstract

“…The existing CE methods can be categorized into gradient-based [21,33], autoencoder [5,19], SAT [16], or mixedinteger linear optimization (MILO) [4,15,29,32]. Since our cost function is non-differentiable due to the discrete nature of a permutation σ over features, we focus on MILO-based methods, which can directly handle such functions.…”

Section: Related Workmentioning

confidence: 99%

“…Constraint 13, if H is a LM, Constraint (14)(15)(16)(17), if H is a TE, Constraint (18)(19)(20)(21), if H is a MLP,…”

Section: Overall Formulationmentioning

confidence: 99%

DACE: Distribution-Aware Counterfactual Explanation by Mixed-Integer Linear Optimization

Kanamori

Takagi

Kobayashi

et al. 2020

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

Counterfactual Explanation (CE) is one of the post-hoc explanation methods that provides a perturbation vector so as to alter the prediction result obtained from a classifier. Users can directly interpret the perturbation as an "action" for obtaining their desired decision results. However, an action extracted by existing methods often becomes unrealistic for users because they do not adequately care about the characteristics corresponding to the empirical data distribution such as feature-correlations and outlier risk. To suggest an executable action for users, we propose a new framework of CE for extracting an action by evaluating its reality on the empirical data distribution. The key idea of our proposed method is to define a new cost function based on the Mahalanobis' distance and the local outlier factor. Then, we propose a mixed-integer linear optimization approach to extracting an optimal action by minimizing our cost function. By experiments on real datasets, we confirm the effectiveness of our method in comparison with existing methods for CE.

show abstract

Adversarial Attacks in Explainable Machine Learning: A Survey of Threats Against Models and Humans

Vadillo,

Santana,

Lozano

2024

WIREs Data Min & Knowl

View full text Add to dashboard Cite

Reliable deployment of machine learning models such as neural networks continues to be challenging due to several limitations. Some of the main shortcomings are the lack of interpretability and the lack of robustness against adversarial examples or out‐of‐distribution inputs. In this paper, we comprehensively review the possibilities and limits of adversarial attacks for explainable machine learning models. First, we extend the notion of adversarial examples to fit in explainable machine learning scenarios where a human assesses not only the input and the output classification, but also the explanation of the model's decision. Next, we propose a comprehensive framework to study whether (and how) adversarial examples can be generated for explainable models under human assessment. Based on this framework, we provide a structured review of the diverse attack paradigms existing in this domain, identify current gaps and future research directions, and illustrate the main attack paradigms discussed. Furthermore, our framework considers a wide range of relevant yet often ignored factors such as the type of problem, the user expertise or the objective of the explanations, in order to identify the attack strategies that should be adopted in each scenario to successfully deceive the model (and the human). The intention of these contributions is to serve as a basis for a more rigorous and realistic study of adversarial examples in the field of explainable machine learning.

show abstract

Explaining Deep Learning Models with Constrained Adversarial Examples

Cited by 29 publications

References 8 publications

A Survey of Contrastive and Counterfactual Explanation Generation Methods for Explainable Artificial Intelligence

A Survey of Contrastive and Counterfactual Explanation Generation Methods for Explainable Artificial Intelligence

DACE: Distribution-Aware Counterfactual Explanation by Mixed-Integer Linear Optimization

Adversarial Attacks in Explainable Machine Learning: A Survey of Threats Against Models and Humans

Contact Info

Product

Resources

About