The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective

Krishna, Satyapriya; Han, Tessa; Gu, Alex; Pombra, Javin; Jabbari, Shahin; Wu, Steven Y.; Lakkaraju, Himabindu

doi:10.48550/arxiv.2202.01602

Cited by 39 publications

(57 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the post hoc explainers we consider have also been shown to be inconsistent, unfaithful, and intractable [17,7,12,6,4]. Consequently, we believe that a potential source of negative societal impact in this work arises from practitioners overtrusting post hoc explainers [16,17]. While this is the case, our study demonstrates that the explainers backed with our proposed defense not only detect adversarial behavior but also faithfully identify the most important features in decisions.…”

Section: Discussionmentioning

confidence: 77%

“…We analyze and alleviate a single shortcoming of using post hoc explanations. However, the post hoc explainers we consider have also been shown to be inconsistent, unfaithful, and intractable [17,7,12,6,4]. Consequently, we believe that a potential source of negative societal impact in this work arises from practitioners overtrusting post hoc explainers [16,17].…”

Section: Discussionmentioning

confidence: 91%

See 1 more Smart Citation

Unfooling Perturbation-Based Post Hoc Explainers

Carmichael¹,

Scheirer²

2022

Preprint

View full text Add to dashboard Cite

Monumental advancements in artificial intelligence (AI) have lured the interest of doctors, lenders, judges, and other professionals. While these high-stakes decisionmakers are optimistic about the technology, those familiar with AI systems are wary about the lack of transparency of its decision-making processes. Perturbation-based post hoc explainers offer a model agnostic means of interpreting these systems while only requiring query-level access. However, recent work demonstrates that these explainers can be fooled adversarially. This discovery has adverse implications for auditors, regulators, and other sentinels. With this in mind, several natural questions arise -how can we audit these black box systems? And how can we ascertain that the auditee is complying with the audit in good faith? In this work, we rigorously formalize this problem and devise a defense against adversarial attacks on perturbation-based explainers. We propose algorithms for the detection (CAD-Detect) and defense (CAD-Defend) of these attacks, which are aided by our novel conditional anomaly detection approach, KNN-CAD. We demonstrate that our approach successfully detects whether a black box system adversarially conceals its decision-making process and mitigates the adversarial attack on real-world data for the prevalent explainers, LIME and SHAP.

show abstract

Section: Discussionmentioning

confidence: 77%

Section: Discussionmentioning

confidence: 91%

Unfooling Perturbation-Based Post Hoc Explainers

Carmichael¹,

Scheirer²

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Explanations should provide new insights: explanations should go beyond the "why, what, how" [20]. We must also be able to compare and contrast them, as explanations can disagree and directly contradict each other [21]. We extend these points and argue that developing metrics for explanations depends on both the audience as well as the capacity of the underlying explanation.…”

Section: Trusted But Not Trustworthy: a New Dark Patternmentioning

confidence: 82%

"Explanation" is Not a Technical Term: The Problem of Ambiguity in XAI

Gilpin¹,

Paley²,

Alam³

et al. 2022

Preprint

View full text Add to dashboard Cite

There is broad agreement that Artificial Intelligence (AI) systems, particularly those using Machine Learning (ML), should be able to "explain" their behavior. Unfortunately, there is little agreement as to what constitutes an "explanation." This has caused a disconnect between the explanations that systems produce in service of explainable Artificial Intelligence (XAI) and those explanations that users and other audiences actually need, which should be defined by the full spectrum of functional roles, audiences, and capabilities for explanation. In this paper, we explore the features of explanations and how to use those features in evaluating their utility. We focus on the requirements for explanations defined by their functional role, the knowledge states of users who are trying to understand them, and the availability of the information needed to generate them. Further, we discuss the risk of XAI enabling trust in systems without establishing their trustworthiness and define a critical next step for the field of XAI to establish metrics to guide and ground the utility of system-generated explanations.

show abstract

“…Lipton [65] examines the motivations underlying interest in interpretability, finding them to be diverse and occasionally discordant. Krishna et al [53] find that state-of-the-art explanation methods may disagree in terms of the explanations they output. Chandrasekaran et al [15] further conclude that existing explanations on VQA model do not actually make its responses and failures more predictable to a human.…”

Section: Limitations and Broader Impactmentioning

confidence: 99%

MultiViz: Towards Visualizing and Understanding Multimodal Models

Liang¹,

Lyu²,

Chhablani³

et al. 2022

Preprint

View full text Add to dashboard Cite

The promise of multimodal models for real-world applications has inspired research in visualizing and understanding their internal mechanics with the end goal of empowering stakeholders to visualize model behavior, perform model debugging, and promote trust in machine learning models. However, modern multimodal models are typically black-box neural networks, which makes it challenging to understand their internal mechanics. How can we visualize the internal modeling of multimodal interactions in these models? Our paper aims to fill this gap by proposing MULTIVIZ, a method for analyzing the behavior of multimodal models by scaffolding the problem of interpretability into 4 stages: (1) unimodal importance: how each modality contributes towards downstream modeling and prediction, (2) cross-modal interactions: how different modalities relate with each other, (3) multimodal representations: how unimodal and cross-modal interactions are represented in decision-level features, and (4) multimodal prediction: how decision-level features are composed to make a prediction. MULTIVIZ is designed to operate on diverse modalities, models, tasks, and research areas. Through experiments on 8 trained models across 6 real-world tasks, we show that the complementary stages in MULTIVIZ together enable users to (1) simulate model predictions, (2) assign interpretable concepts to features, (3) perform error analysis on model misclassifications, and (4) use insights from error analysis to debug models. MULTIVIZ is publicly available, will be regularly updated with new interpretation tools and metrics, and welcomes inputs from the community.Preprint. Under review.

show abstract

The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective

Cited by 39 publications

References 0 publications

Unfooling Perturbation-Based Post Hoc Explainers

Unfooling Perturbation-Based Post Hoc Explainers

"Explanation" is Not a Technical Term: The Problem of Ambiguity in XAI

MultiViz: Towards Visualizing and Understanding Multimodal Models

Contact Info

Product

Resources

About