Debiasing NLP Models Without Demographic Information

Orgad, Hadas; Belinkov, Yonatan

doi:10.48550/arxiv.2212.10563

Cited by 1 publication

(2 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Baselines. We consider popular baselines from prior work (Joshi et al, 2022;Orgad and Belinkov, 2022): weighting methods like DFL, DFL-nodemog, Product of Experts (Mahabadi et al, 2019;Orgad and Belinkov, 2022) and latent space removal methods like INLP (Ravfogel et al, 2020). We also include worst-group accuracy methods like GroupDRO, Subsampling (Sagawa et al, 2019(Sagawa et al, , 2020 from the machine learning literature, and a baseline RemoveToken that removes the treatment feature from input (see Supp C).…”

Section: Accuracy Of Feag Classifiersmentioning

confidence: 99%

“…For removing spurious correlations, a common principle underlying past work is to make a model's prediction invariant to the features that exhibit the correlation. This can be done by data augmentation (Kaushik et al, 2019), latent space removal (Ravfogel et al, 2020), subsampling (Sagawa et al, 2019(Sagawa et al, , 2020, or sample reweighing (Mahabadi et al, 2019;Orgad and Belinkov, 2022). In many cases, however, the correlated features may be important for the task and their complete removal can cause a degradation in task performance.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Microsoft Azure IoT Platform

Bansal

2020

Designing Internet of Things Solutions With Microsoft Azure

View full text Add to dashboard Cite

To address the problem of NLP classifiers learning spurious correlations between training features and target labels, a common approach is to make the model's predictions invariant to these features. However, this can be counterproductive when the features have a non-zero causal effect on the target label and thus are important for prediction. Therefore, using methods from the causal inference literature, we propose an algorithm to regularize the learnt effect of the features on the model's prediction to the estimated effect of feature on label. This results in an automated augmentation method that leverages the estimated effect of a feature to appropriately change the labels for new augmented inputs. On toxicity and IMDB review datasets, the proposed algorithm minimises spurious correlations and improves the minority group (i.e., samples breaking spurious correlations) accuracy, while also improving the total accuracy compared to standard training. 1

show abstract