Quantifying Feature Contributions to Overall Disparity Using Information Theory

Dutta, Sanghamitra; Venkatesh, Praveen; Grover, Pulkit

doi:10.48550/arxiv.2206.08454

Cited by 2 publications

(5 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, this calculation will not be accurate and outcomes can significantly differ from true model outcomes i.e. when feature 𝑎 is not used to train the model [13,19,24,33]. In fact, (interventional) SHAP simulates the removal of features by marginalising over their marginal distributions and not by re-training a new model without such features [36].…”

Section: Refresh: Theory and Methodsmentioning

confidence: 99%

REFRESH: Responsible and Efficient Feature Reselection guided by SHAP values

Sharma,

Dutta,

Albini

et al. 2023

Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society

View full text Add to dashboard Cite

Feature selection is a crucial step in building machine learning models. This process is often achieved with accuracy as an objective, and can be cumbersome and computationally expensive for large-scale datasets. Several additional model performance characteristics such as fairness and robustness are of importance for model development.As regulations are driving the need for more trustworthy models, deployed models need to be corrected for model characteristics associated with responsible artificial intelligence. When feature selection is done with respect to one model performance characteristic (eg. accuracy), feature selection with secondary model performance characteristics (eg. fairness and robustness) as objectives would require going through the computationally expensive selection process from scratch. In this paper, we introduce the problem of feature reselection, so that features can be selected with respect to secondary model performance characteristics efficiently even after a feature selection process has been done with respect to a primary objective. To address this problem, we propose REFRESH, a method to reselect features so that additional constraints that are desirable towards model performance can be achieved without having to train several new models. REFRESH's underlying algorithm is a novel technique using SHAP values and correlation analysis that can approximate for the predictions of a model without having to train these models. Empirical evaluations on three datasets, including a large-scale loan defaulting dataset show that REFRESH can help find alternate models with better model characteristics efficiently. We also discuss the need for reselection and REFRESH based on regulation desiderata.

show abstract

Section: Refresh: Theory and Methodsmentioning

confidence: 99%

REFRESH: Responsible and Efficient Feature Reselection guided by SHAP values

Sharma,

Dutta,

Albini

et al. 2023

Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society

View full text Add to dashboard Cite

show abstract

“…In Section 3, we first introduce the problem setup for quantifying non-exempt disparity as discussed in [8,29], and present several canonical examples and candidate measures, understanding their pros and cons, until we arrive at the proposed measure in [8] that satisfies the desirable properties. In Section 4, we review how PID can help in assessing the contributions of either features or data points with applications in feature selection (as discussed in [31]). Related works include [35][36][37][38].…”

Section: Scenario 3: Formalizing Tradeoffs In Distributed Environmentsmentioning

confidence: 99%

“…Towards answering this question, ref. [31] proposes two measures for quantifying the contribution of each feature to the overall disparity. The first measure, which is referred to as interventional contribution, is defined as follows:…”

Section: Information-theoretic Measuresmentioning

confidence: 99%

“…Instead, ref. [31] focuses on introducing information-theoretic measures to examine the problem from a distributional lens, and contrasting contribution and potential contribution, also touching upon the issue of substitute features. We include a summary in Table 2.…”

Section: Information-theoretic Measuresmentioning

confidence: 99%

“…We include a summary in Table 2. We also refer to [31] for a more detailed discussion. Accounts for redundant features that can substitute an important feature when it is dropped.…”

Section: Information-theoretic Measuresmentioning

confidence: 99%

See 2 more Smart Citations

A Review of Partial Information Decomposition in Algorithmic Fairness and Explainability

Dutta

Faisal

2023

Entropy

View full text Add to dashboard Cite

Partial Information Decomposition (PID) is a body of work within information theory that allows one to quantify the information that several random variables provide about another random variable, either individually (unique information), redundantly (shared information), or only jointly (synergistic information). This review article aims to provide a survey of some recent and emerging applications of partial information decomposition in algorithmic fairness and explainability, which are of immense importance given the growing use of machine learning in high-stakes applications. For instance, PID, in conjunction with causality, has enabled the disentanglement of the non-exempt disparity which is the part of the overall disparity that is not due to critical job necessities. Similarly, in federated learning, PID has enabled the quantification of tradeoffs between local and global disparities. We introduce a taxonomy that highlights the role of PID in algorithmic fairness and explainability in three main avenues: (i) Quantifying the legally non-exempt disparity for auditing or training; (ii) Explaining contributions of various features or data points; and (iii) Formalizing tradeoffs among different disparities in federated learning. Lastly, we also review techniques for the estimation of PID measures, as well as discuss some challenges and future directions.

show abstract

Quantifying Feature Contributions to Overall Disparity Using Information Theory

Cited by 2 publications

References 34 publications

REFRESH: Responsible and Efficient Feature Reselection guided by SHAP values

REFRESH: Responsible and Efficient Feature Reselection guided by SHAP values

A Review of Partial Information Decomposition in Algorithmic Fairness and Explainability

Contact Info

Product

Resources

About