Unveiling Identity Biases in Toxicity Detection : A Game-Focused Dataset and Reactivity Analysis Approach

Van Dorpe, Josiane; Yang, Zachary; Grenon-Godbout, Nicolas; Winterstein, Grégoire

doi:10.18653/v1/2023.emnlp-industry.26

Cited by 1 publication

(3 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Post-training assessment necessitates the exploration of various methodologies to analyse potential biases within the model. In Van Dorpe et al [204], a set of templates was devised to scrutinize the impact of protected group presence or absence in toxic detection. This analysis includes the calculation of a reactivity score, determined by assessing the average predictive difference across all sentence templates.…”

Section: Biases In Toxicity Detectionmentioning

confidence: 99%

“…Despite the primary focus of this review not being centred on mitigating toxic detection bias, we have observed evaluation metrics [49,146,147], bias analysis methodologies [117,204,212], and bias mitigation techniques [45,88,96,112,134,139,209,221] that are cornerstones to the improvement of these models. Biases in toxic detection are not only embedded during the training phase but also inherent in the base models [136,147], resulting in the exacerbation of these biases and making it harder to assess and mitigate them after training.…”

Section: Biasmentioning

confidence: 99%

“…These biases not only affect the prediction capabilities of the models by considering features that were learned during the training process but do not extrapolate to reality, directly affecting the end user. The repercussions of these biases may censor users based on the way they express language [103], their health condition [225], and their beliefs [40], and can also amplify these biases when used in mitigation processes [204]. Despite considerable work having been undertaken in this area, the predominant focus lies on identity-related bias [45], neglecting to acknowledge a broader framework wherein all biases are encompassed.…”

Section: Biasmentioning

confidence: 99%

See 2 more Smart Citations

A Systematic Review of Toxicity in Large Language Models: Definitions, Datasets, Detectors, Detoxification Methods and Challenges

Villate-Castillo,

Lorente,

Urquijo

2024

Preprint

View full text Add to dashboard Cite

The emergence of the transformer architecture has ushered in a new era of possibilities, showcasing remarkable capabilities in generative tasks exemplified by models like GPT4o, Claude 3, and Llama 3. However, these advancements come with a caveat: predominantly trained on data gleaned from social media platforms, these systems inadvertently perpetuate societal biases and toxicity. Recognizing the paramount importance of AI Safety and Alignment, our study embarks on a thorough exploration through a comprehensive literature review focused on toxic language. Delving into various definitions, detection methodologies, and mitigation strategies, we aim to shed light on the complexities of this issue. While our focus primarily centres on transformer-based architectures, we also acknowledge and incorporate existing research within the realm of deep learning. Through our investigation, we uncover a multitude of challenges inherent in toxicity mitigation and detection models. These challenges range from inherent biases and generalization issues to the necessity for standardized definitions of toxic language and the quality assurance of dataset annotations. Furthermore, we emphasize the significance of transparent annotation processes, resolution of annotation disagreements, and the enhancement of Large Language Models (LLMs) robustness. Additionally, we advocate for the creation of standardized benchmarks to gauge the effectiveness of toxicity mitigation and detection methods. Addressing these challenges is not just imperative, but pivotal in advancing the development of safer and more ethically aligned AI systems.

show abstract

Section: Biases In Toxicity Detectionmentioning

confidence: 99%