Detecting Cross-Geographic Biases in Toxicity Modeling on Social Media

Ghosh, Sayan; Baker, Dylan; Jurgens, David; Prabhakaran, Vinodkumar

doi:10.18653/v1/2021.wnut-1.35

Cited by 19 publications

(21 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the (finite set of) identity-related and offensive tokens considered in this work are all in English and centered around Western cultural context. We leave the evaluation of our methodology to assess whether there are language-or more broadly culture-dependent changes for future work, following recent work on biases in geo-cultural contexts (Ghosh et al, 2021).…”

Section: Ethical Considerationsmentioning

confidence: 99%

Features or Spurious Artifacts? Data-centric Baselines for Fair and Robust Hate Speech Detection

Ramponi¹,

Tonelli²

2022

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

Warning: this paper contains content that may be offensive or upsetting.Avoiding to rely on dataset artifacts to predict hate speech is at the cornerstone of robust and fair hate speech detection. In this paper we critically analyze lexical biases in hate speech detection via a cross-platform study, disentangling various types of spurious and authentic artifacts and analyzing their impact on out-of-distribution fairness and robustness. We experiment with existing approaches and propose simple yet surprisingly effective datacentric baselines. Our results on English data across four platforms show that distinct spurious artifacts require different treatments to ultimately attain both robustness and fairness in hate speech detection. To encourage research in this direction, we release all baseline models and the code to compute artifacts, pointing it out as a complementary and necessary addition to the data statements practice.

show abstract

Section: Ethical Considerationsmentioning

confidence: 99%

Features or Spurious Artifacts? Data-centric Baselines for Fair and Robust Hate Speech Detection

Ramponi¹,

Tonelli²

2022

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

show abstract

“…Ghosh et al [33] noted that a cross-geographical/cultural application of toxicity detectors can lead to lexical bias. They noted that majority of the literature focuses on the English language and the geo-cultural scenarios of a handful of countries.…”

Section: Cross-geographic Biasmentioning

confidence: 99%

“…As seen in Section 5, this false positive bias can be explained through the over-representation of specific terms in the toxic class of the training dataset. Based on the above observations, Ghosh et al [33] then proposed a two-step weakly-supervised method to detect lexical bias for cross-geocultural toxic content. They carried out this analysis using unlabeled tweets collected from across seven countries.…”

Section: Cross-geographic Biasmentioning

confidence: 99%

“…Especially since what can be considered toxic in English speaking geographies may not be considered toxic in other geographies. The initial study in cross-cultural bias is being led by the work of Ghosh et al [33]. However, the extensive study of toxicity bias in non-English and code-mixed settings remain non-existent.…”

Section: Case Study: Shift In Bias Due To Knowledge-based Generalizat...mentioning

confidence: 99%

“…Fig.4. Ghosh et al[33] proposed two axes: (i) descriptive and (ii) prescriptive, to segment the associations of the over-represented terms for the model under bias evaluation.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Handling Bias in Toxic Speech Detection: A Survey

Garg¹,

Masud²,

Suresh³

et al. 2022

Preprint

View full text Add to dashboard Cite

The massive growth of social media usage has witnessed a tsunami of online toxicity in teams of hate speech, abusive posts, cyberbullying, etc. Detecting online toxicity is challenging due to its inherent subjectivity. Factors such as the context of the speech, geography, socio-political climate, and background of the producers and consumers of the posts play a crucial role in determining if the content can be flagged as toxic. Adoption of automated toxicity detection models in production can lead to a sidelining of the various demographic and psychographic groups they aim to help in the first place. It has piqued researchers' interest in examining unintended biases and their mitigation. Due to the nascent and multi-faceted nature of the work, complete literature is chaotic in its terminologies, techniques, and findings. In this paper, we put together a systematic study to discuss the limitations and challenges of existing methods.We start by developing a taxonomy for categorising various unintended biases and a suite of evaluation metrics proposed to quantify such biases. We take a closer look at each proposed method for evaluating and mitigating bias in toxic speech detection. To examine the limitations of existing methods, we also conduct a case study to introduce the concept of bias shift due to knowledge-based bias mitigation methods. The survey concludes with an overview of the critical challenges, research gaps and future directions.While reducing toxicity on online platforms continues to be an active area of research, a systematic study of various biases and their mitigation strategies will help the research community produce robust and fair models.

show abstract

Regional bias in monolingual English language models

Lyu,

Dost,

Koh

et al. 2024

Mach Learn

View full text Add to dashboard Cite

In Natural Language Processing (NLP), pre-trained language models (LLMs) are widely employed and refined for various tasks. These models have shown considerable social and geographic biases creating skewed or even unfair representations of certain groups. Research focuses on biases toward L2 (English as a second language) regions but neglects bias within L1 (first language) regions. In this work, we ask if there is regional bias within L1 regions already inherent in pre-trained LLMs and, if so, what the consequences are in terms of downstream model performance. We contribute an investigation framework specifically tailored for low-resource regions, offering a method to identify bias without imposing strict requirements for labeled datasets. Our research reveals subtle geographic variations in the word embeddings of BERT, even in cultures traditionally perceived as similar. These nuanced features, once captured, have the potential to significantly impact downstream tasks. Generally, models exhibit comparable performance on datasets that share similarities, and conversely, performance may diverge when datasets differ in their nuanced features embedded within the language. It is crucial to note that estimating model performance solely based on standard benchmark datasets may not necessarily apply to the datasets with distinct features from the benchmark datasets. Our proposed framework plays a pivotal role in identifying and addressing biases detected in word embeddings, particularly evident in low-resource regions such as New Zealand.

show abstract

Detecting Cross-Geographic Biases in Toxicity Modeling on Social Media

Cited by 19 publications

References 37 publications

Features or Spurious Artifacts? Data-centric Baselines for Fair and Robust Hate Speech Detection

Features or Spurious Artifacts? Data-centric Baselines for Fair and Robust Hate Speech Detection

Handling Bias in Toxic Speech Detection: A Survey

Regional bias in monolingual English language models

Contact Info

Product

Resources

About