Abstract:In this paper we propose a new algorithm, named NICE, to generate counterfactual explanations for tabular data that specifically takes into account algorithmic requirements that often emerge in real-life deployments: (1) the ability to provide an explanation for all predictions, (2) being able to handle any classification model (also non-differentiable ones), (3) being efficient in run time, and (4) providing multiple counterfactual explanations with different characteristics. More specifically, our approach e… Show more
“…This privacy risk occurs when the counterfactual algorithm uses instance-based strategies to ind the counterfactual explanations. These counterfactuals correspond to the nearest unlike neighbor and are also called native counterfactuals [5,25]. Other counterfactual algorithms use perturbation where synthetic counterfactuals are generated by perturbing the factual instance and labelling it with the machine learning model, without reference to known cases in the training set [25].…”
Section: Problem Statement: Explanation Linkage Attacksmentioning
confidence: 99%
“…Other counterfactual algorithms use perturbation where synthetic counterfactuals are generated by perturbing the factual instance and labelling it with the machine learning model, without reference to known cases in the training set [25]. We focus on counterfactual algorithms that return real instances: several algorithms do this, as this substantially decreases the run time while also increasing desirable properties of the explanations such as plausibility [5]. Plausibility measures how realistic the counterfactual explanation is with respect to the data manifold, which is a desirable property [22], and Brughmans et al [5] show that the techniques resulting in an actual instance have the best plausibility results.…”
Section: Problem Statement: Explanation Linkage Attacksmentioning
confidence: 99%
“…We focus on counterfactual algorithms that return real instances: several algorithms do this, as this substantially decreases the run time while also increasing desirable properties of the explanations such as plausibility [5]. Plausibility measures how realistic the counterfactual explanation is with respect to the data manifold, which is a desirable property [22], and Brughmans et al [5] show that the techniques resulting in an actual instance have the best plausibility results. Furthermore, it is argued that counterfactual instances that are plausible, are more robust and therefore are less vulnerable to the uncertainty of the classiication model or changes over time [2,5,42].…”
Section: Problem Statement: Explanation Linkage Attacksmentioning
confidence: 99%
“…Plausibility measures how realistic the counterfactual explanation is with respect to the data manifold, which is a desirable property [22], and Brughmans et al [5] show that the techniques resulting in an actual instance have the best plausibility results. Furthermore, it is argued that counterfactual instances that are plausible, are more robust and therefore are less vulnerable to the uncertainty of the classiication model or changes over time [2,5,42]. This shows that for some use cases it can be very useful to use real data points as counterfactuals instead of synthetic ones as for the latter the risk of generating implausible counterfactual explanations can be quite high [27].…”
Section: Problem Statement: Explanation Linkage Attacksmentioning
confidence: 99%
“…This shows that for some use cases it can be very useful to use real data points as counterfactuals instead of synthetic ones as for the latter the risk of generating implausible counterfactual explanations can be quite high [27]. Algorithms that use these native counterfactual explanations include NICE (without optimization setting) [5], the WIT tool with NNCE [59], FACE [44] and certain settings of CBR [25]. Perturbation-based counterfactual algorithms experience diferent privacy risks such as membership inference attacks: Pawelczyk et al [43] use counterfactual distance-based attacks which leverage algorithmic recourse to determine if an instance belongs to the training data of the underlying model or not.…”
Section: Problem Statement: Explanation Linkage Attacksmentioning
Black-box machine learning models are used in an increasing number of high-stakes domains, and this creates a growing need for Explainable AI (XAI). However, the use of XAI in machine learning introduces privacy risks, which currently remain largely unnoticed. Therefore, we explore the possibility of an
explanation linkage attack
, which can occur when deploying instance-based strategies to find counterfactual explanations. To counter such an attack, we propose
k
-anonymous counterfactual explanations and introduce
pureness
as a metric to evaluate the
validity
of these
k
-anonymous counterfactual explanations. Our results show that making the explanations, rather than the whole dataset,
k
-anonymous, is beneficial for the quality of the explanations.
“…This privacy risk occurs when the counterfactual algorithm uses instance-based strategies to ind the counterfactual explanations. These counterfactuals correspond to the nearest unlike neighbor and are also called native counterfactuals [5,25]. Other counterfactual algorithms use perturbation where synthetic counterfactuals are generated by perturbing the factual instance and labelling it with the machine learning model, without reference to known cases in the training set [25].…”
Section: Problem Statement: Explanation Linkage Attacksmentioning
confidence: 99%
“…Other counterfactual algorithms use perturbation where synthetic counterfactuals are generated by perturbing the factual instance and labelling it with the machine learning model, without reference to known cases in the training set [25]. We focus on counterfactual algorithms that return real instances: several algorithms do this, as this substantially decreases the run time while also increasing desirable properties of the explanations such as plausibility [5]. Plausibility measures how realistic the counterfactual explanation is with respect to the data manifold, which is a desirable property [22], and Brughmans et al [5] show that the techniques resulting in an actual instance have the best plausibility results.…”
Section: Problem Statement: Explanation Linkage Attacksmentioning
confidence: 99%
“…We focus on counterfactual algorithms that return real instances: several algorithms do this, as this substantially decreases the run time while also increasing desirable properties of the explanations such as plausibility [5]. Plausibility measures how realistic the counterfactual explanation is with respect to the data manifold, which is a desirable property [22], and Brughmans et al [5] show that the techniques resulting in an actual instance have the best plausibility results. Furthermore, it is argued that counterfactual instances that are plausible, are more robust and therefore are less vulnerable to the uncertainty of the classiication model or changes over time [2,5,42].…”
Section: Problem Statement: Explanation Linkage Attacksmentioning
confidence: 99%
“…Plausibility measures how realistic the counterfactual explanation is with respect to the data manifold, which is a desirable property [22], and Brughmans et al [5] show that the techniques resulting in an actual instance have the best plausibility results. Furthermore, it is argued that counterfactual instances that are plausible, are more robust and therefore are less vulnerable to the uncertainty of the classiication model or changes over time [2,5,42]. This shows that for some use cases it can be very useful to use real data points as counterfactuals instead of synthetic ones as for the latter the risk of generating implausible counterfactual explanations can be quite high [27].…”
Section: Problem Statement: Explanation Linkage Attacksmentioning
confidence: 99%
“…This shows that for some use cases it can be very useful to use real data points as counterfactuals instead of synthetic ones as for the latter the risk of generating implausible counterfactual explanations can be quite high [27]. Algorithms that use these native counterfactual explanations include NICE (without optimization setting) [5], the WIT tool with NNCE [59], FACE [44] and certain settings of CBR [25]. Perturbation-based counterfactual algorithms experience diferent privacy risks such as membership inference attacks: Pawelczyk et al [43] use counterfactual distance-based attacks which leverage algorithmic recourse to determine if an instance belongs to the training data of the underlying model or not.…”
Section: Problem Statement: Explanation Linkage Attacksmentioning
Black-box machine learning models are used in an increasing number of high-stakes domains, and this creates a growing need for Explainable AI (XAI). However, the use of XAI in machine learning introduces privacy risks, which currently remain largely unnoticed. Therefore, we explore the possibility of an
explanation linkage attack
, which can occur when deploying instance-based strategies to find counterfactual explanations. To counter such an attack, we propose
k
-anonymous counterfactual explanations and introduce
pureness
as a metric to evaluate the
validity
of these
k
-anonymous counterfactual explanations. Our results show that making the explanations, rather than the whole dataset,
k
-anonymous, is beneficial for the quality of the explanations.
Counterfactual explanations elucidate algorithmic decisions by pointing to scenarios that would have led to an alternative, desired outcome. Giving insight into the model’s behavior, they hint users towards possible actions and give grounds for contesting decisions. As a crucial factor in achieving these goals, counterfactuals must be plausible, i.e., describing realistic alternative scenarios within the data manifold. This paper leverages a recently developed generative modeling technique – adversarial random forests (ARFs) – to efficiently generate plausible counterfactuals in a model-agnostic way. ARFs can serve as a plausibility measure or directly generate counterfactual explanations. Our ARF-based approach surpasses the limitations of existing methods that aim to generate plausible counterfactual explanations: It is easy to train and computationally highly efficient, handles continuous and categorical data naturally, and allows integrating additional desiderata such as sparsity in a straightforward manner.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.