Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1474
|View full text |Cite
|
Sign up to set email alerts
|

Multilingual and Multi-Aspect Hate Speech Analysis

Abstract: Current research on hate speech analysis is typically oriented towards monolingual and single classification tasks. In this paper, we present a new multilingual multi-aspect hate speech analysis dataset and use it to test the current state-of-the-art multilingual multitask learning approaches. We evaluate our dataset in various classification settings, then we discuss how to leverage our annotations in order to improve hate speech detection and classification in general. 4 https://competitions.codalab.org/comp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
140
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 159 publications
(142 citation statements)
references
References 23 publications
2
140
0
Order By: Relevance
“…Regarding the languages, as expected, most of the resources use English data, although in some cases they are collected along with texts in Hindi (Bohra et al 2018;Kumar et al 2018a;Mathur et al 2018) or they are part of even larger multilingual collections (Chung et al 2019;Ousidhoum et al 2019;Steinberger et al 2017). It is also worth pointing out that less-resourced languages such as Amharic, Bengali, Slovene and Swedish, are also represented in the corpora we found, thus enabling a greater linguistic diversity in this field.…”
Section: Tablesupporting
confidence: 58%
See 1 more Smart Citation
“…Regarding the languages, as expected, most of the resources use English data, although in some cases they are collected along with texts in Hindi (Bohra et al 2018;Kumar et al 2018a;Mathur et al 2018) or they are part of even larger multilingual collections (Chung et al 2019;Ousidhoum et al 2019;Steinberger et al 2017). It is also worth pointing out that less-resourced languages such as Amharic, Bengali, Slovene and Swedish, are also represented in the corpora we found, thus enabling a greater linguistic diversity in this field.…”
Section: Tablesupporting
confidence: 58%
“…For example, our survey captures a great availability of benchmark datasets for the evaluation of abusive language and hate speech detection systems, in several languages and with several topical focuses. This adds to the challenge of investigating architectures which are stable and well-performing across different languages and abusive domains, making it a more and more promising topic to research (Corazza et al 2020;Pamungkas and Patti 2019;Ousidhoum et al 2019).…”
Section: Lexical Analysismentioning
confidence: 99%
“…Ousidhoum et al [34], Davidson et al [13], Waseem and Hovy [47], and Elsherief et al [16] released their annotated hate speech datasets in public. We made use of the latter three datasets in our research.…”
Section: Related Work 21 Hate Speech Detectionmentioning
confidence: 99%
“…Further, the researchers have been able to show that many hate words refer to political topics. In addition to the mono-linguistic identification of hate speech, there have been attempts to identify hate speech in different languages using deep-learning techniques and compare them to traditional machine-learning methods [23]. Another aspect that relates to the political context of social media is that these platforms are more often portrayed as a threat to democracy, as they allow interactions between likeminded people.…”
Section: Related Workmentioning
confidence: 99%