Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1530
|View full text |Cite
|
Sign up to set email alerts
|

It’s All in the Name: Mitigating Gender Bias with Name-Based Counterfactual Data Substitution

Abstract: This paper treats gender bias latent in word embeddings. Previous mitigation attempts rely on the operationalisation of gender bias as a projection over a linear subspace. An alternative approach is Counterfactual Data Augmentation (CDA), in which a corpus is duplicated and augmented to remove bias, e.g. by swapping all inherently-gendered words in the copy. We perform an empirical comparison of these approaches on the English Gigaword and Wikipedia, and find that whilst both successfully reduce direct bias an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
65
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 78 publications
(76 citation statements)
references
References 12 publications
0
65
0
Order By: Relevance
“…Our proposed method is agnostic to the details of the algorithms used to learn the input word embeddings. Moreover, unlike counterfactual data augmentation methods for debiasing (Zmigrod et al, 2019;Hall Maudslay et al, 2019), we do not require access to the original training resources used for learning the input word embeddings.…”
Section: Introductionmentioning
confidence: 99%
“…Our proposed method is agnostic to the details of the algorithms used to learn the input word embeddings. Moreover, unlike counterfactual data augmentation methods for debiasing (Zmigrod et al, 2019;Hall Maudslay et al, 2019), we do not require access to the original training resources used for learning the input word embeddings.…”
Section: Introductionmentioning
confidence: 99%
“…While this method inevitably introduces some translation errors, it at least creates a more male-female-balanced dataset. 8 As previously discussed, this counterfactual data augmentation approach is closest in spirit to prior work on reducing gender bias in monolingual NLP tools (Maudslay et al 2019;Zhao et al 2018;Zmigrod et al 2019).…”
Section: Removing Gender Bias Before Trainingmentioning
confidence: 95%
“…Given this, the research presented in this article is more closely related to studies which have sought to balance masculine and feminine terms in the training data itself. Such approaches exist for English data (e.g., Maudslay et al 2019;Zhao et al, 2018). Zmigrod et al (2019) introduced a more complicated data-augmentation scheme for inflected languages with rich morphology like German, but their scheme revolves around swapping a single targeted word per sentence.…”
Section: Gender Bias In Nmt Systemsmentioning
confidence: 99%
See 1 more Smart Citation
“…Lu et al [33] tries to reduce the detected bias by using the data augmentation technique Contextual Data Augmentation CDA which consists of adding a complementary gender phrase to the sentences of the initial dataset. Based on CDA, in 2019 Hall Maudslay et al [65] will develop Contextual Data Substitution CDS. It proposes to eliminate the bias associated with proper names by adding a phrase with a complementary gender name in a balanced way.…”
Section: Coreference Resolutionmentioning
confidence: 99%