Abstract:Counterfactual explanations (CEs) are a powerful means for understanding how decisions made by algorithms can be changed.Researchers have proposed a number of desiderata that CEs should meet to be practically useful, such as requiring minimal effort to enact, or complying with causal models. We consider a further aspect to improve the usability of CEs: robustness to adverse perturbations, which may naturally happen due to unfortunate circumstances. Since CEs typically prescribe a sparse form of intervention (i… Show more
“…In summary, feasible, actionable, and sparse counterfactual explanations recommend causality-consistent scenarios that can be reasonably implemented by the individuals impacted by algorithmically-generated outcomes, once they act on the values of a limited number of features. Finally, we note that authors have recently suggested additional desiderata of counterfactual explanations, such as diversity and robustness to local perturbations [10], [12], [17]. The former refers to the possibility of generating diverse counterfactuals for a given outcome to explain [10].…”
Section: B Selected Desiderata Of Counterfactual Explanationsmentioning
confidence: 99%
“…In fact, the goal of diversity is to provide individuals with different counterfactual scenarios to perform algorithmic recourse [10]. The latter refers to the degree to which counterfactuals are sensitive to (possibly adverse) perturbations of the data point whose machine learning outcome has to be explained, instead [12], [17]. We refer to [10], [12], [17] for all details.…”
Section: B Selected Desiderata Of Counterfactual Explanationsmentioning
confidence: 99%
“…The latter refers to the degree to which counterfactuals are sensitive to (possibly adverse) perturbations of the data point whose machine learning outcome has to be explained, instead [12], [17]. We refer to [10], [12], [17] for all details.…”
Section: B Selected Desiderata Of Counterfactual Explanationsmentioning
confidence: 99%
“…They are an example of "contrastive explanations in xAI" [5], [9]: they explain a given model outcome by sharing a "whatif" alternative scenario comprising of feature-perturbed versions of the same individual [10]- [12]. Recent literature from the xAI domain has discussed selected desiderata that may support the applicability of counterfactual explanations in real-world machine learning model pipelines [10], [12]- [17]. In particular, the desiderata of feasibility, actionability and sparsity would allow to generate and share cognitively accessible counterfactual explanations that respect causal models between features, and suggest actionable strategies whose alternative scenarios comprise the change of a limited number of features.…”
Counterfactual explanations are a prominent example of post-hoc interpretability methods in the explainable Artificial Intelligence (AI) research domain. Differently from other explanation methods, they offer the possibility to have recourse against unfavourable outcomes computed by machine learning models. However, in this paper we show that retraining machine learning models over time may invalidate the counterfactual explanations of their outcomes. We provide a formal definition of this phenomenon and we introduce a method, namely counterfactual data augmentation, to help improving the robustness of counterfactual explanations over time. We test our method in an empirical study where we simulate different model retraining scenarios. Our results show that counterfactual data augmentation improves the robustness of counterfactual explanations over time, therefore contributing to their use in real-world machine learning applications.
“…In summary, feasible, actionable, and sparse counterfactual explanations recommend causality-consistent scenarios that can be reasonably implemented by the individuals impacted by algorithmically-generated outcomes, once they act on the values of a limited number of features. Finally, we note that authors have recently suggested additional desiderata of counterfactual explanations, such as diversity and robustness to local perturbations [10], [12], [17]. The former refers to the possibility of generating diverse counterfactuals for a given outcome to explain [10].…”
Section: B Selected Desiderata Of Counterfactual Explanationsmentioning
confidence: 99%
“…In fact, the goal of diversity is to provide individuals with different counterfactual scenarios to perform algorithmic recourse [10]. The latter refers to the degree to which counterfactuals are sensitive to (possibly adverse) perturbations of the data point whose machine learning outcome has to be explained, instead [12], [17]. We refer to [10], [12], [17] for all details.…”
Section: B Selected Desiderata Of Counterfactual Explanationsmentioning
confidence: 99%
“…The latter refers to the degree to which counterfactuals are sensitive to (possibly adverse) perturbations of the data point whose machine learning outcome has to be explained, instead [12], [17]. We refer to [10], [12], [17] for all details.…”
Section: B Selected Desiderata Of Counterfactual Explanationsmentioning
confidence: 99%
“…They are an example of "contrastive explanations in xAI" [5], [9]: they explain a given model outcome by sharing a "whatif" alternative scenario comprising of feature-perturbed versions of the same individual [10]- [12]. Recent literature from the xAI domain has discussed selected desiderata that may support the applicability of counterfactual explanations in real-world machine learning model pipelines [10], [12]- [17]. In particular, the desiderata of feasibility, actionability and sparsity would allow to generate and share cognitively accessible counterfactual explanations that respect causal models between features, and suggest actionable strategies whose alternative scenarios comprise the change of a limited number of features.…”
Counterfactual explanations are a prominent example of post-hoc interpretability methods in the explainable Artificial Intelligence (AI) research domain. Differently from other explanation methods, they offer the possibility to have recourse against unfavourable outcomes computed by machine learning models. However, in this paper we show that retraining machine learning models over time may invalidate the counterfactual explanations of their outcomes. We provide a formal definition of this phenomenon and we introduce a method, namely counterfactual data augmentation, to help improving the robustness of counterfactual explanations over time. We test our method in an empirical study where we simulate different model retraining scenarios. Our results show that counterfactual data augmentation improves the robustness of counterfactual explanations over time, therefore contributing to their use in real-world machine learning applications.
Machine learning plays a role in many deployed decision systems, often in ways that are difficult or impossible to understand by human stakeholders. Explaining, in a human-understandable way, the relationship between the input and output of machine learning models is essential to the development of trustworthy machine learning based systems. A burgeoning body of research seeks to define the goals and methods of
explainability
in machine learning. In this paper, we seek to review and categorize research on
counterfactual explanations
, a specific class of explanation that provides a link between what could have happened had input to a model been changed in a particular way. Modern approaches to counterfactual explainability in machine learning draw connections to the established legal doctrine in many countries, making them appealing to fielded systems in high-impact areas such as finance and healthcare. Thus, we design a rubric with desirable properties of counterfactual explanation algorithms and comprehensively evaluate all currently proposed algorithms against that rubric. Our rubric provides easy comparison and comprehension of the advantages and disadvantages of different approaches and serves as an introduction to major research themes in this field. We also identify gaps and discuss promising research directions in the space of counterfactual explainability.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.