2020
DOI: 10.48550/arxiv.2004.14454
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
27
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3

Relationship

4
4

Authors

Journals

citations
Cited by 20 publications
(27 citation statements)
references
References 0 publications
0
27
0
Order By: Relevance
“…There have been different types of abusive content addressed in recent studies including hate speech [26], aggression [22,23], and cyberbullying [42]. A few annotation taxonomies, such as the one proposed by OLID [56] and replicated in other studies [43], try to take advantage of the similarities between these sub-tasks allowing us to consider multiple types of abusive language at once.…”
Section: Related Workmentioning
confidence: 99%
“…There have been different types of abusive content addressed in recent studies including hate speech [26], aggression [22,23], and cyberbullying [42]. A few annotation taxonomies, such as the one proposed by OLID [56] and replicated in other studies [43], try to take advantage of the similarities between these sub-tasks allowing us to consider multiple types of abusive language at once.…”
Section: Related Workmentioning
confidence: 99%
“…In terms of languages, the majority of studies on this topic deal with English (Malmasi and Zampieri, 2017;Yao et al, 2019;Ridenhour et al, 2020;Rosenthal et al, 2020) due to the the wide availability of language resources such as corpora and pre-trained models. In recent years, several studies have been published on identifying offensive content in other languages such as Arabic (Mubarak et al, 2020), Dutch (Tulkens et al, 2016), French (Chiril et al, 2019), Greek (Pitenis et al, 2020), Italian (Poletto et al, 2017), Portuguese (Fortuna et al, 2019), and Turkish (Çöltekin, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…Identifying Toxicity -Most work on identifying toxic language looked at a individual social media posts or comments without taking context into account (Davidson et al, 2017;Xu et al, 2012;Zampieri et al, 2019;Rosenthal et al, 2020;Kumar et al, 2018;Garibo i Orts, 2019;Ousidhoum et al, 2019;Breitfeller et al, 2019;Hada et al, 2021;Barikeri et al, 2021)…”
Section: Related Workmentioning
confidence: 99%