Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence 2019
DOI: 10.24963/ijcai.2019/388
|View full text |Cite
|
Sign up to set email alerts
|

The Dangers of Post-hoc Interpretability: Unjustified Counterfactual Explanations

Abstract: Post-hoc interpretability approaches have been proven to be powerful tools to generate explanations for the predictions made by a trained blackbox model. However, they create the risk of having explanations that are a result of some artifacts learned by the model instead of actual knowledge from the data. This paper focuses on the case of counterfactual explanations and asks whether the generated instances can be justified, i.e. continuously connected to some ground-truth data. We evaluate the risk of generati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
80
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 121 publications
(88 citation statements)
references
References 0 publications
0
80
0
Order By: Relevance
“…A similar limitation is observed due to possibly high variability of the input. Laugel et al raise the issue of justification for counterfactual explanation [144]. They argue that a synthesized counterfactual data point must be connected to the training data.…”
Section: ) Explainability Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…A similar limitation is observed due to possibly high variability of the input. Laugel et al raise the issue of justification for counterfactual explanation [144]. They argue that a synthesized counterfactual data point must be connected to the training data.…”
Section: ) Explainability Methodsmentioning
confidence: 99%
“…interpretable intermediate predictors) mimic the local neighbourhood (i.e., fidelity) and the data example to be explained (i.e., hit). Laugel et al measure how justified counterfactuals are by averaging a binary score (one if the explanation is justified following the proposed definition, zero otherwise) over all the generated explanations [100], [144]. It is worth noting that the run-time of explanation generation algorithms is reported in addition to the evaluation metrics for several frameworks [132], [139], [146], [152], [156], [159].…”
Section: ) Evaluation Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…The XAI methods introduced so far produce a posteriori explanations of deep learning models. Although such post hoc interpretations have been shown to be useful, some argue that, ideally, XAI methods, should automatically offer human-interpretable explanation alongside their predictions 105 . Such approaches (herein referred to as 'self-explaining') would promote verification and error analysis, and be directly linkable with domain knowledge 106 .…”
Section: Box 2 | Xai Applied To Cytochrome P450-mediated Metabolismmentioning
confidence: 99%