2021
DOI: 10.1007/s43681-021-00081-0
|View full text |Cite
|
Sign up to set email alerts
|

Bias and comparison framework for abusive language datasets

Abstract: Recently, numerous datasets have been produced as research activities in the field of automatic detection of abusive language or hate speech have increased. A problem with this diversity is that they often differ, among other things, in context, platform, sampling process, collection strategy, and labeling schema. There have been surveys on these datasets, but they compare the datasets only superficially. Therefore, we developed a bias and comparison framework for abusive language datasets for their in-depth a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 37 publications
(53 reference statements)
0
2
0
Order By: Relevance
“…pose a significant concern as they diminish the generalizability of models and may increase the risk of developing discriminatory models against certain social categories. Our awareness is limited to a single study (Wich et al, 2022) that reviewed biases in Arabic toxic language datasets. Specifically, the study covered six Arabic datasets and five in English.…”
Section: Findings and Discussionmentioning
confidence: 99%
“…pose a significant concern as they diminish the generalizability of models and may increase the risk of developing discriminatory models against certain social categories. Our awareness is limited to a single study (Wich et al, 2022) that reviewed biases in Arabic toxic language datasets. Specifically, the study covered six Arabic datasets and five in English.…”
Section: Findings and Discussionmentioning
confidence: 99%
“… Al Kuwatly, Wich, and Groh (2020) identified annotator bias based on several demographic characteristics such as age, first language, and education level that leads to biased abusive language and hate speech detectors. Lastly, Wich, Bauer, and Groh (2020) found a negative effect of political bias in hate speech detection models and later developed a framework to analyse and uncover inherent biases in abusive language datasets ( Wich, Eder, Al Kuwatly, & Groh, 2022 ). In this paper, we address the ethical principles of fairness and prevention of harm ( High-Level Expert Group on AI, 2019 ).…”
Section: Related Workmentioning
confidence: 99%