2023
DOI: 10.1007/s00146-022-01619-4
|View full text |Cite
|
Sign up to set email alerts
|

Equal accuracy for Andrew and Abubakar—detecting and mitigating bias in name-ethnicity classification algorithms

Abstract: Uncovering the world’s ethnic inequalities is hampered by a lack of ethnicity-annotated datasets. Name-ethnicity classifiers (NECs) can help, as they are able to infer people’s ethnicities from their names. However, since the latest generation of NECs rely on machine learning and artificial intelligence (AI), they may suffer from the same racist and sexist biases found in many AIs. Therefore, this paper offers an algorithmic fairness audit of three NECs. It finds that the UK-Census-trained EthnicityEstimator d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 90 publications
0
5
0
Order By: Relevance
“…Separating noise from signal and manually verifying the quality of each datapoint is impossible; as a result, we relied on validated computational tools to infer trends in publications and author characteristics, while remaining cognizant of the inherent biases or inaccuracies that these tools may embody. 57…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Separating noise from signal and manually verifying the quality of each datapoint is impossible; as a result, we relied on validated computational tools to infer trends in publications and author characteristics, while remaining cognizant of the inherent biases or inaccuracies that these tools may embody. 57…”
Section: Discussionmentioning
confidence: 99%
“…Separating noise from signal and manually verifying the quality of each datapoint is impossible; as a result, we relied on validated computational tools to infer trends in publications and author characteristics, while remaining cognizant of the inherent biases or inaccuracies that these tools may embody. 57 In particular, the inference of race, ethnicity, and sex based on names is an approach commonly applied in the social sciences when annotated data are not available because of logistical or discriminatory concerns. 35,58,59 Although such algorithms merely provide a probability that a name is associated with a particular identity group, the general trends we identified highlight crucial deficits in representation.…”
Section: Limitationsmentioning
confidence: 99%
“…To meet the needs of the present study, the original NEC machine learning model proposed by Hafner et al [ 9 ] was customized. The original model analyzes the individual characters of names and classifies them into up to 49 nationalities and ethnic groups.…”
Section: Methodsmentioning
confidence: 99%
“…The training was performed following the methodology described by Hafner et al [ 9 ]. The remaining 10% of the dataset, i.e., 17,408 names, were then used to evaluate the model, resulting in an 88.09% accuracy score.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation