Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021
DOI: 10.18653/v1/2021.emnlp-main.785
|View full text |Cite
|
Sign up to set email alerts
|

Assessing the Reliability of Word Embedding Gender Bias Measures

Abstract: Various measures have been proposed to quantify human-like social biases in word embeddings. However, bias scores based on these measures can suffer from measurement error. One indication of measurement quality is reliability, concerning the extent to which a measure produces consistent results. In this paper, we assess three types of reliability of word embedding gender bias measures, namely testretest reliability, inter-rater consistency and internal consistency. Specifically, we investigate the consistency … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(15 citation statements)
references
References 32 publications
(48 reference statements)
0
4
0
Order By: Relevance
“…This task contains 276 template sentences t ∈ T , where for each occupation o that sentence either starts with that occupation, "man", or "woman", resulting in a triplet 3 We leave out the word pair ('guy', 'gal'), as we have noticed better results without the word pair. Ethayarajh et al (2019) and Du et al (2021) warn that including low-frequency words can negatively impact the bias measure, which we suspect is the case here.…”
Section: From Gender Representation To Gender Biasmentioning
confidence: 67%
“…This task contains 276 template sentences t ∈ T , where for each occupation o that sentence either starts with that occupation, "man", or "woman", resulting in a triplet 3 We leave out the word pair ('guy', 'gal'), as we have noticed better results without the word pair. Ethayarajh et al (2019) and Du et al (2021) warn that including low-frequency words can negatively impact the bias measure, which we suspect is the case here.…”
Section: From Gender Representation To Gender Biasmentioning
confidence: 67%
“…Following the advise from [21], and to assess the quality of the gender direction obtained, we further perform PCA starting from an extended list of 50 pairs of gender words, taken from [7], and compare the result with � ⃗ g . From the full list of pairs available on the author's repository, 1 we select only those consisting of words present in GloVe.…”
Section: Gender Directionmentioning
confidence: 99%
“…For instance, Jacobs and Wallach (2021) argue for applying psychometrics to study algorithmic fairness -a discussion we now extend to NLP bias measures. In section 6 we will consequently position our paper in the literature and compare our contributions to those of related works (Bommasani & Liang, 2022;Du et al, 2021;Jacobs & Wallach, 2021, i.a. ).…”
Section: Introductionmentioning
confidence: 99%