2021
DOI: 10.1007/s00779-021-01609-1
|View full text |Cite
|
Sign up to set email alerts
|

Towards multidomain and multilingual abusive language detection: a survey

Abstract: Abusive language is an important issue in online communication across different platforms and languages. Having a robust model to detect abusive instances automatically is a prominent challenge. Several studies have been proposed to deal with this vital issue by modeling this task in the cross-domain and cross-lingual setting. This paper outlines and describes the current state of this research direction, providing an overview of previous studies, including the available datasets and approaches employed in bot… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
19
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 23 publications
(31 citation statements)
references
References 117 publications
(261 reference statements)
2
19
0
Order By: Relevance
“…Applying our methodology to other languages is not trivial, as it depends on the availability of language resources and robust NLP tools for them (Pamungkas et al, 2021). Fortunately, full-fledged NLP pipelines do exist for many languages, thanks for instance to large-scale initiatives such as Universal Dependencies, which provides among its deliverables the UDpipe software library and a broad set of trained models in more than 70 languages (Nivre et al, 2016;Straka et al, 2016).…”
Section: Discussionmentioning
confidence: 99%
“…Applying our methodology to other languages is not trivial, as it depends on the availability of language resources and robust NLP tools for them (Pamungkas et al, 2021). Fortunately, full-fledged NLP pipelines do exist for many languages, thanks for instance to large-scale initiatives such as Universal Dependencies, which provides among its deliverables the UDpipe software library and a broad set of trained models in more than 70 languages (Nivre et al, 2016;Straka et al, 2016).…”
Section: Discussionmentioning
confidence: 99%
“…Hate speech datasets also differ in annotation schema, which is shown in recent surveys Vidgen and Derczynski (2020); Poletto et al (2021); Pamungkas et al (2021a). This variety is due to the multifaceted nature of hate speech, as it can be directed against individuals or groups, be implicit or explicit, and have varying themes such as race, gender, or disability.…”
Section: Hate Speech Definitionsmentioning
confidence: 99%
“…2 . Considering the above discussed variance issues of hate speech definition and label sets, multilingual hate speech detection remains an important and relevant task, since social media platforms are multilingual spaces where people may easily communicate in their native tongue Pamungkas et al (2021a). Due to the costliness of collecting and annotating new data, it is relevant to consider ways of exploiting resources that are already available.…”
Section: Hate Speech Data Scarcity and Cross-lingual Transfermentioning
confidence: 99%
“…OffensEval 2020 (Zampieri et al 2020) featured offensive language identification datasets in Arabic, Danish, Greek, Turkish and English. We direct interested readers to relevant surveys for further information (Schmidt and Wiegand 2017;Fortuna and Nunes 2018;Poletto et al 2020;Vidgen and Derczynski 2020;Pamungkas, Basile, and Patti 2021b). Only a handful of studies have investigated zero-shot cross-lingual transfer learning for hate speech detection.…”
Section: Related Workmentioning
confidence: 99%