Human Imperceptible Attacks and Applications to Improve Fairness

Hua, Xinru; Xu, Huanzhong; Blanchet, José; Nguyên, Viêt Anh

doi:10.48550/arxiv.2111.15603

Cited by 1 publication

(1 citation statement)

References 61 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Particularly, we define and compute Fairness Influence Function (FIF) that quantifies the contribution of individual and subset of features to the resulting bias. FIFs do not only allow practitioners to identify the features to act up on but also to quantify the effect of various affirmative [8,19,23,[45][46][47][48] or punitive actions [21,32,42] on the resulting bias.…”

Section: Introductionmentioning

confidence: 99%

How Biased is Your Feature?: Computing Fairness Influence Functions with Global Sensitivity Analysis

Ghosh¹,

Basu²,

Meel³

2022

Preprint

View full text Add to dashboard Cite

Fairness in machine learning has attained significant focus due to the widespread application of machine learning in high-stake decision-making tasks. Unless regulated with a fairness objective, machine learning classifiers might demonstrate unfairness/bias towards certain demographic populations in the data. Thus, the quantification and mitigation of the bias induced by classifiers have become a central concern. In this paper, we aim to quantify the influence of different features on the bias of a classifier. To this end, we propose a framework of Fairness Influence Function (FIF), and compute it as a scaled difference of conditional variances in the classifier's prediction. We also instantiate an algorithm, FairXplainer, that uses variance decomposition among the subset of features and a local regressor to compute FIFs accurately, while also capturing the intersectional effects of the features. Our experimental analysis validates that FairXplainer captures the influences of both individual features and higher-order feature interactions, estimates the bias more accurately than existing local explanation methods, and detects the increase/decrease in bias due to affirmative/punitive actions in the classifier.Preprint. Under review.

show abstract