Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2023
DOI: 10.1145/3580305.3599896
|View full text |Cite
|
Sign up to set email alerts
|

Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment

Abstract: Social media is awash with hateful content, much of which is often veiled with linguistic and topical diversity. The benchmark datasets used for hate speech detection do not account for such divagation as they are predominantly compiled using hate lexicons. However, capturing hate signals becomes challenging in neutrally-seeded malicious content. Thus, designing models and datasets that mimic the real-world variability of hate warrants further investigation.To this end, we present GOTHate, a large-scale code-m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 60 publications
(96 reference statements)
0
1
0
Order By: Relevance
“…Along similar lines, we have hate and offense datasets from diverse online forums like Wikipedia [64], Stormfront [16], Facebook [46], Reddit [37] etc. On the other hand, to overcome the use of hate lexicons in curating the datasets, large-scale neutrally seeded datasets have also been proposed [19,28,55]. The initial research in hate speech datasets focused on multi-class text classification assuming English posts.…”
Section: Related Workmentioning
confidence: 99%
“…Along similar lines, we have hate and offense datasets from diverse online forums like Wikipedia [64], Stormfront [16], Facebook [46], Reddit [37] etc. On the other hand, to overcome the use of hate lexicons in curating the datasets, large-scale neutrally seeded datasets have also been proposed [19,28,55]. The initial research in hate speech datasets focused on multi-class text classification assuming English posts.…”
Section: Related Workmentioning
confidence: 99%