Handling Bias in Toxic Speech Detection: A Survey

Garg, Tanmay; Masud, Sarah; Suresh, Tharun; Chakraborty, Tanmoy

doi:10.48550/arxiv.2202.00126

Cited by 4 publications

(9 citation statements)

References 69 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Toxicity detection is inherently complex and subjective, with different definitions and interpretations among researchers (Garg et al, 2022;Kowert, 2020). Biases also vary across communities, influenced by culture, origin and socio-political context.…”

Section: Related Workmentioning

confidence: 99%

“…In this study, we define biases as "prejudice in favour of or against one thing, person, or group compared with another usually in a way that's considered to be unfair" (University of California). Natural language processing encompasses a wide range of types of biases, categorized by their sources or the type of harm they cause (Sap et al, 2019;Garg et al, 2022). Our focus lies specifically on lexical identity biases, which refer to biases conveyed by terms related to one's identity or characteristic (Zhou et al, 2021).…”

Section: Related Workmentioning

confidence: 99%

“…Biases can emerge during dataset creation when practitioners sample data, annotators label data based on personal understanding, culture, and experiences, and practitioners aggregate labels. In this study, we specifically focus on the issue of models overestimating the toxicity of terms associated with certain concepts, leading to problematic false positives and even false negatives (Dixon et al, 2018;Kiritchenko and Mohammad, 2018;Prabhakaran et al, 2019;Sap et al, 2019;Garg et al, 2022). Existing research has primarily focused on biases in toxicity detection without considering the specific use-case of in-game chat, despite the widespread presence of toxicity in that particular context.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Unveiling Identity Biases in Toxicity Detection : A Game-Focused Dataset and Reactivity Analysis Approach

Van Dorpe,

Yang,

Grenon-Godbout

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track

View full text Add to dashboard Cite

Identity biases arise commonly from annotated datasets, can be propagated in language models and can cause further harm to marginal groups. Existing bias benchmarking datasets are mainly focused on gender or racial biases and are made to pinpoint which class the model is biased towards. They also are not designed for the gaming industry, a concern for models built for toxicity detection in videogames' chat. We propose a dataset and a method to highlight oversensitive terms using reactivity analysis and the model's performance. We test our dataset against ToxBuster, a language model developed by Ubisoft fine-tuned for toxicity detection on multiplayer videogame's written chat, and Perspective API. We find that these toxicity models often automatically tag terms related to a community's identity as toxic, which prevents members of already marginalized groups to make their presence known or have a mature / normal conversation. Through this process, we have generated an interesting list of terms that trigger the models to varying degrees, along with insights on establishing a baseline through human annotations.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Unveiling Identity Biases in Toxicity Detection : A Game-Focused Dataset and Reactivity Analysis Approach

Van Dorpe,

Yang,

Grenon-Godbout

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track

View full text Add to dashboard Cite

show abstract

“…The COMPAS dataset contains outcomes within 2 years of the decision, for over 10,000 criminal defendants in Broward County, Florida. The ''Stop, Question and Frisk'' database 5 contains data from NYPD officers' interactions with potential suspects of committing a crime. Features include locality-based information like time, street name, area code, etc.…”

Section: ) Racial and Religious Bias Datasetsmentioning

confidence: 99%

“…Levitin [4] highlights that as data is collected by humans, they decide what to collect and what not. The objective for which the data is collected and its respective planning leads to wrong analysis/conclusions, e.g., which population/features to select and what to label, also called lexical bias [5]. At the learning stage, it is the bias that exists due to the transfer of bias in the model and how much it affects certain groups while proposing a generalised model that will work for all groups in the data [6].…”

Section: Introductionmentioning

confidence: 99%

Bias Detection for Customer Interaction Data: A Survey on Datasets, Methods, and Tools

Donald

Galanopoulos²,

Curry

et al. 2023

IEEE Access

View full text Add to dashboard Cite

With the increase in usage of machine learning models within many different aspects of customer interactions, it has become very clear that bias detection within associated customer interaction datasets has led to a critical focus on issues such as the identification of bias prior to model building, lack of understanding and transparency within models, and ultimately the prevention of biased predictions or classifications. This has never been more important since the introduction of the EU General Data Protection Regulation (GDPR) and the associated rule of ''right of explanation''. In this paper, we survey the state of the art for bias detection, avoidance and mitigation within datasets, and the associated methods and tools available. Our purpose is to establish an understanding of how established customer interaction-based use cases can utilise these techniques. The focus is primarily on tackling the bias in unstructured text data as a pre-process prior to the machine learning model training phase. We hope that this research encourages the further establishment of responsible usage of customer interaction datasets to allow the prevention of bias being introduced into machine learning pipelines and to also allow greater awareness of the potential for further research in this area.

show abstract

Bases sociocognitivas do discurso de ódio online no Brasil: uma revisão narrativa interdisciplinar

Freitas,

Romero,

Pantaleão

et al. 2023

Texto livre

View full text Add to dashboard Cite

Resumo O crescimento das redes sociais deu força sem precedentes aos discursos de ódio, que têm causado danos globalmente. Este artigo objetivou discutir os substratos sociocognitivos do discurso de ódio e o papel das redes sociais no agravamento do problema, integrando conhecimentos das neurociências, da Psicologia Social, Análise Crítica do Discurso, entre outras, propondo uma breve revisão narrativa para auxiliar a compreensão e o combate ao discurso de ódio no contexto brasileiro. Por meio da articulação dessas áreas, foram abordados temas centrais ao discurso de ódio: sua natureza como prática social e os processos sociocognitivos subjacentes a ele, como a categorização social e formação de estereótipos, preconceitos e identidade social, fenômenos que podem mediar conflitos interpessoais e intergrupais. A partir de conceitos já bastante consolidados, buscou-se literatura atualizada para compreender e ilustrar a dimensão da problemática dos discursos de ódio. Este trabalho aponta direções estratégicas para combater e mitigar efeitos negativos dos discursos de ódio, para promover sociedades mais justas e cooperativas, com adoção de medidas socioeducativas dentro e fora da Internet.

show abstract

Handling Bias in Toxic Speech Detection: A Survey

Cited by 4 publications

References 69 publications

Unveiling Identity Biases in Toxicity Detection : A Game-Focused Dataset and Reactivity Analysis Approach

Unveiling Identity Biases in Toxicity Detection : A Game-Focused Dataset and Reactivity Analysis Approach

Bias Detection for Customer Interaction Data: A Survey on Datasets, Methods, and Tools

Bases sociocognitivas do discurso de ódio online no Brasil: uma revisão narrativa interdisciplinar

Contact Info

Product

Resources

About