Counterfactual Fairness in Text Classification through Robustness

Garg, Sahaj; Pérot, Vinçent; Limtiaco, Nicole; Taly, Ankur; Chi, Ed; Beutel, Alex

doi:10.48550/arxiv.1809.10610

Cited by 6 publications

(17 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recent research indicates that state of the art benchmarks for toxicity disproportionately misclassify utterances from marginalised social groups as toxic (Welbl et al, 2021), a concern that is particularly pronounced for African American English (Dixon et al, 2018;Ghaffary, 2019;Hanu et al, 2021;Sap et al, 2019) 6 . The question of how to mitigate bias in toxic or hate speech detection remains an area of active inquiry (Davani et al, 2020;Garg et al, 2019).…”

Section: Racist Bias In Toxicity Detectionmentioning

confidence: 99%

Ethical and social risks of harm from Language Models

Weidinger¹,

Mellor²,

Rauh³

et al. 2021

Preprint

107

View full text Add to dashboard Cite

This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary literature from computer science, linguistics, and social sciences.

show abstract

Section: Racist Bias In Toxicity Detectionmentioning

confidence: 99%

Ethical and social risks of harm from Language Models

Weidinger¹,

Mellor²,

Rauh³

et al. 2021

Preprint

107

View full text Add to dashboard Cite

show abstract

“…To assess accuracy, following Singh & Joachims (2019), we report the average stochastic test NDCG by sampling 25 rankings for each query from the learned ranking policy. To assess individual fairness, we use ranking stability with respect to demographic perturbations, which is the natural analogue of an evaluation metric for individual fairness in classification (Yurochkin & Sun, 2021;Garg et al, 2018). In particular, for each query, we create a hypothetical query by flipping the (binary) gender of each individual in the query, and deterministically rank by sorting the items by their scores.…”

Section: German Credit Data Setmentioning

confidence: 99%

Individually Fair Ranking

Bower¹,

Eftekhari²,

Yurochkin³

et al. 2021

Preprint

View full text Add to dashboard Cite

We develop an algorithm to train individually fair learning-to-rank (LTR) models. The proposed approach ensures items from minority groups appear alongside similar items from majority groups. This notion of fair ranking is based on the definition of individual fairness from supervised learning and is more nuanced than prior fair LTR approaches that simply ensure the ranking model provides underrepresented items with a basic level of exposure. The crux of our method is an optimal transport-based regularizer that enforces individual fairness and an efficient algorithm for optimizing the regularizer. We show that our approach leads to certifiably individually fair LTR models and demonstrate the efficacy of our method on ranking tasks subject to demographic biases.

show abstract

“…The choice of d Y is often determined by the form of the output. For example, if the ML model outputs a vector of the logits, then we may pick the Euclidean norm as d Y [17,10]. The metric d X is the crux of (2.1) because it encodes our intuition of which inputs are similar for the ML task at hand.…”

Section: A Transport-based Definition Of Individual Fairnessmentioning

confidence: 99%

SenSeI: Sensitive Set Invariance for Enforcing Individual Fairness

Yurochkin,

Sun

2020

Preprint

View full text Add to dashboard Cite

In this paper, we cast fair machine learning as invariant machine learning. We first formulate a version of individual fairness that enforces invariance on certain sensitive sets. We then design a transport-based regularizer that enforces this version of individual fairness and develop an algorithm to minimize the regularizer efficiently. Our theoretical results guarantee the proposed approach trains certifiably fair ML models. Finally, in the experimental studies we demonstrate improved fairness metrics in comparison to several recent fair training procedures on three ML tasks that are susceptible to algorithmic bias.

show abstract

Counterfactual Fairness in Text Classification through Robustness

Cited by 6 publications

References 0 publications

Ethical and social risks of harm from Language Models

Ethical and social risks of harm from Language Models

Individually Fair Ranking

SenSeI: Sensitive Set Invariance for Enforcing Individual Fairness

Contact Info

Product

Resources

About