2021
DOI: 10.48550/arxiv.2109.05704
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Mitigating Language-Dependent Ethnic Bias in BERT

Abstract: BERT and other large-scale language models (LMs) contain gender and racial bias. They also exhibit other dimensions of social bias, most of which have not been studied in depth, and some of which vary depending on the language. In this paper, we study ethnic bias and how it varies across languages by analyzing and mitigating ethnic bias in monolingual BERT for English, German, Spanish, Korean, Turkish, and Chinese. To observe and quantify ethnic bias, we develop a novel metric called Categorical Bias score. Th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 14 publications
0
3
0
Order By: Relevance
“…To quantify social bias, binary-category metrics such as LP BS [11] for Gender bias and Racial bias, and multi-category metric CBS [12] are employed, as illustrated in Fig. 2.…”
Section: Methods Of Quantifying Social Biasmentioning
confidence: 99%
See 2 more Smart Citations
“…To quantify social bias, binary-category metrics such as LP BS [11] for Gender bias and Racial bias, and multi-category metric CBS [12] are employed, as illustrated in Fig. 2.…”
Section: Methods Of Quantifying Social Biasmentioning
confidence: 99%
“…We compare and analyze the changes in bias between the original base model and the finetuned model. For quantifying societal bias, we utilize the Categorical Bias Score (CBS) [12] for multi-category Ethnic bias and the Log-Probability Bias Score (LPBS) [11] for binary-category Gender and Racial biases. To mitigate bias, we employ methods such as adjusting the frequency of specific target words and modifying the existing training data set by transforming attributes from harmful to non-harmful words.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Furthermore, an analysis of GPT-2 and GPT-3.5 revealed a propensity to generate masculine-associated pronouns more frequently than feminine-associated ones and show gender-biased association in the context of professions, considering occupations such as Doctor or Engineer as masculine more often than roles like Nurse and Teacher, often regarded as feminine (81). Language-dependent ethnic biases, involving the over-generalised association of an ethnic group to specific attributes, mostly negative, have been found in BERT, where non-toxic comments are incorrectly labelled as toxic when including Middle Eastern country names (82).…”
Section: (Gen)ai Bias-driven Criminalisationmentioning
confidence: 99%