Jaimeen Ahn scite author profile

Jaimeen Ahn

5Publications

36Citation Statements Received

125Citation Statements Given

How they've been cited

How they cite others

123

Affiliations

Korea Advanced Institute of Science and Technology

Publications

Order By: Most citations

Mitigating Language-Dependent Ethnic Bias in BERT

Ahn¹,

Oh²

2021

View full text Add to dashboard Cite

BERT and other large-scale language models (LMs) contain gender and racial bias. They also exhibit other dimensions of social bias, most of which have not been studied in depth, and some of which vary depending on the language. In this paper, we study ethnic bias and how it varies across languages by analyzing and mitigating ethnic bias in monolingual BERT for English, German, Spanish, Korean, Turkish, and Chinese. To observe and quantify ethnic bias, we develop a novel metric called Categorical Bias score. Then we propose two methods for mitigation; first using a multilingual model, and second using contextual word alignment of two monolingual models. We compare our proposed methods with monolingual BERT and show that these methods effectively alleviate the ethnic bias. Which of the two methods works better depends on the amount of NLP resources available for that language. We additionally experiment with Arabic and Greek to verify that our proposed methods work for a wider variety of languages.

show abstract

KOLD: Korean Offensive Language Dataset

Jeong¹,

Oh²,

Ahn³

et al. 2022

Preprint

View full text Add to dashboard Cite

Warning: this paper contains content that may be offensive or upsettingAlthough large attention has been paid to the detection of hate speech, most work has been done in English, failing to make it applicable to other languages. To fill this gap, we present a Korean offensive language dataset (KOLD), 40k comments labeled with offensiveness, target, and targeted group information. We also collect two types of span, offensive and target span that justifies the decision of the categorization within the text. Comparing the distribution of targeted groups with the existing English dataset, we point out the necessity of a hate speech dataset fitted to the language that best reflects the culture. Trained with our dataset, we report the baseline performance of the models built on top of large pretrained language models. We also show that title information serves as context and is helpful to discern the target of hatred, especially when they are omitted in the comment.

show abstract

KOLD: Korean Offensive Language Dataset

Jeong¹,

Oh²,

Lee³

et al. 2022

View full text Add to dashboard Cite

Why Knowledge Distillation Amplifies Gender Bias and How to Mitigate from the Perspective of DistilBERT

Ahn¹,

Lee²,

Kim³

et al. 2022

View full text Add to dashboard Cite

Knowledge distillation is widely used to transfer the language understanding of a large model to a smaller model. However, after knowledge distillation, it was found that the smaller model is more biased by gender compared to the source large model. This paper studies what causes gender bias to increase after the knowledge distillation process. Moreover, we suggest applying a variant of the mixup on knowledge distillation, which is used to increase generalizability during the distillation process, not for augmentation. By doing so, we can significantly reduce the gender bias amplification after knowledge distillation. We also conduct an experiment on the GLUE benchmark to demonstrate that even if the mixup is applied, it does not have a significant adverse effect on the model's performance.

show abstract

Mitigating Language-Dependent Ethnic Bias in BERT

Ahn¹,

Oh²

2021

Preprint

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jaimeen Ahn

Mitigating Language-Dependent Ethnic Bias in BERT

KOLD: Korean Offensive Language Dataset

KOLD: Korean Offensive Language Dataset

Why Knowledge Distillation Amplifies Gender Bias and How to Mitigate from the Perspective of DistilBERT

Mitigating Language-Dependent Ethnic Bias in BERT

Contact Info

Product

Resources

About