Benchmarking and Survey of Explanation Methods for Black Box Models

Bodria, Francesco; Giannotti, Fosca; Guidotti, Riccardo; Naretto, Francesca; Pedreschi, Dino; Rinzivillo, Salvatore

doi:10.48550/arxiv.2102.13076

Cited by 27 publications

(38 citation statements)

References 82 publications

(148 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thus, algorithms that require gradients to discover counterfactual examples (e.g., like those in [9,56]) are not an option. We initially considered three gradient-free algorithms from the literature to search for counterfactual examples/explanations, namely Growing Spheres (GrSp) [34], LOcal Rule-based Explanations (LORE) [17] (these two have been used recently, e.g., in [3,36]), and the implementation of the Nelder-Mead method (NeMe) [13,43] by SciPy [55]. More details on these algorithms are given in Appendix B.…”

Section: Counterfactual Search Algorithmmentioning

confidence: 99%

On the Robustness of Sparse Counterfactual Explanations to Adverse Perturbations

Virgolin¹,

Fracaros²

2022

Preprint

View full text Add to dashboard Cite

Counterfactual explanations (CEs) are a powerful means for understanding how decisions made by algorithms can be changed.Researchers have proposed a number of desiderata that CEs should meet to be practically useful, such as requiring minimal effort to enact, or complying with causal models. We consider a further aspect to improve the usability of CEs: robustness to adverse perturbations, which may naturally happen due to unfortunate circumstances. Since CEs typically prescribe a sparse form of intervention (i.e., only a subset of the features should be changed), we provide two definitions of robustness, which concern, respectively, the features to change and to keep as they are. These definitions are workable in that they can be incorporated as penalty terms in the loss functions that are used for discovering CEs. To experiment with the proposed definitions of robustness, we create and release code where five data sets (commonly used in the field of fair and explainable machine learning) have been enriched with feature-specific annotations that can be used to sample meaningful perturbations. Our experiments show that CEs are often not robust and, if adverse perturbations take place, the intervention they prescribe may require a much larger cost than anticipated, or even become impossible. However, accounting for robustness in the search process, which can be done rather easily, allows discovering robust CEs systematically.

show abstract

Section: Counterfactual Search Algorithmmentioning

confidence: 99%

On the Robustness of Sparse Counterfactual Explanations to Adverse Perturbations

Virgolin¹,

Fracaros²

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Figure 1 of [38] suggests that in 2020 there were 400+ publications related to interpretability alone. The survey articles [1,8,38] provide systematic overview of the terminologies and the available techniques for different types of AI models for text, image, and tables. Some of the prominent techniques rely on the notions of feature importance [3], Shapley values [27], and counterfactual explanations [34].…”

Section: Related Workmentioning

confidence: 99%

Verifying Controllers with Convolutional Neural Network-based Perception: A Case for Intelligible, Safe, and Precise Abstractions

Hsieh¹,

Joshi²,

Misailovíc³

et al. 2021

Preprint

View full text Add to dashboard Cite

Convolutional Neural Networks (CNN) for object detection, lane detection, and segmentation now sit at the head of most autonomy pipelines, and yet, their safety analysis remains an important challenge. Formal analysis of perception models is fundamentally difficult because their correctness is hard if not impossible to specify. We present a technique for inferring intelligible and safe abstractions for perception models from system-level safety requirements, data, and program analysis of the modules that are downstream from perception. The technique can help tradeoff safety, size, and precision, in creating abstractions and the subsequent verification. We apply the method to two significant case studies based on highfidelity simulations (a) a vision-based lane keeping controller for an autonomous vehicle and (b) a controller for an agricultural robot. We show how the generated abstractions can be composed with the downstream modules and then the resulting abstract system can be verified using program analysis tools like CBMC. Detailed evaluations of the impacts of size, safety requirements, and the environmental parameters (e.g., lighting, road surface, plant type) on the precision of the generated abstractions suggest that the approach can help guide the search for corner cases and safe operating envelops.

show abstract

“…Therefore, setting a proper visit order may lead to a better and intuitive understanding of the model prediction. There are papers in the published literature, where the BD method is used for post hoc model interpretability in a non-hierarchical learning setting 5,[34][35][36] . In this study, the BD results are discussed by referring to the BD data from our recent published work 37 , where the step-down method was used.…”

Section: Introductionmentioning

confidence: 99%

“…However, the differences were not discussed in sufficient detail. Rinzivillo et al also reported both SHAP and BD decompositions, but made a simple comparison on the feature importance plots from these methods without a further analysis about the differences 35 . Gosiewska and Biecek examined the dataset about the sinking of the Titanic using BD and SHAP.…”

Section: Introductionmentioning

confidence: 99%

A Comparison of Explainable Artificial Intelligence Methods in the Phase Classification of Multi-Principal Element Alloys

Lee

Ayyasamy

et al. 2022

Preprint

View full text Add to dashboard Cite

We demonstrate the capabilities of two model-agnostic local post hoc model interpretability methods, namely breakDown (BD) and shapley (SHAP), to explain the predictions of a black-box classification learning model that establishes a quantitative relationship between chemical composition and multi-principal element alloys phase formation. We trained an ensemble of support vector machines using a dataset with 1,821 instances, 12 features with low pair-wise correlation, and seven phase labels. Feature contributions to the model prediction are computed by BD and SHAP for each composition. The resulting BD and SHAP transformed data are then used as inputs to identify similar composition groups using k-means clustering. Explanation-of-clusters by features reveal that the results from SHAP agree more closely with the literature. Visualization of compositions within a cluster using Ceteris-Paribus (CP) profile plots show the functional dependencies between the feature values and predicted response. Despite the differences between BD and SHAP in variable attribution, only minor changes were observed in the CP profile plots. Explanation-of-clusters by examples show that the clusters that share a common phase label contain similar compositions, which clarifies the similar-looking CP profile trends. In the limits of a dataset with independent and non-interacting features, SHAP and BD appear to capture similar pattern.

show abstract

Benchmarking and Survey of Explanation Methods for Black Box Models

Cited by 27 publications

References 82 publications

On the Robustness of Sparse Counterfactual Explanations to Adverse Perturbations

On the Robustness of Sparse Counterfactual Explanations to Adverse Perturbations

Verifying Controllers with Convolutional Neural Network-based Perception: A Case for Intelligible, Safe, and Precise Abstractions

A Comparison of Explainable Artificial Intelligence Methods in the Phase Classification of Multi-Principal Element Alloys

Contact Info

Product

Resources

About