2019
DOI: 10.48550/arxiv.1908.11049
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multilingual and Multi-Aspect Hate Speech Analysis

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
13
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 11 publications
(14 citation statements)
references
References 0 publications
1
13
0
Order By: Relevance
“…Annotators are seen to be self-consistent in highlighting terms and providing final judgments 94.5% of the time. If we measure inter-annotator agreement across annotators, however, we observe a Fleiss kappa of only 0.15, which is comparable to prior work [9,21,31,35]. Given task subjectivity, this is why self-consistency checks are so important for ensuring data quality.…”
Section: Dataset Properties and Evaluationsupporting
confidence: 70%
“…Annotators are seen to be self-consistent in highlighting terms and providing final judgments 94.5% of the time. If we measure inter-annotator agreement across annotators, however, we observe a Fleiss kappa of only 0.15, which is comparable to prior work [9,21,31,35]. Given task subjectivity, this is why self-consistency checks are so important for ensuring data quality.…”
Section: Dataset Properties and Evaluationsupporting
confidence: 70%
“…Nevertheless, with the advances in multilingual parsers and deep learning technology, together with increasing pressures from policy-makers to handle hate speech issues at local resources, non-English HS detection toolkits have seen a steady increase. The figure indicates that about 51% of all works in this field are performed on English dataset, with an increase of proportion of other languages as well where Arabic (13% ) [93,59,12,143], Turkish (6%) [143,104], Greek (4%) [143,6,136], Danish (5%) [106,143], Hindi (4%) [121,22,88], German (4% ) [72,120], Malayalam (3%) [130,109], Tamil (3%) [130,20], Chinese (1%) [138,139,155], Italian (2%) [116], Urdu (1%) [126,95,7], Russian(1%) [17], Bengali (1% ) [62,127,69], Korean (1%) [91], French (1%) [16,102,50], Indonesian (1%) [14], Portuguese (1%) [14], Spanish (1%) [56] and Polish (1%) [118] seem to dominate the rest of the languages in this field.…”
Section: Statistical Trends Of Resultsmentioning
confidence: 99%
“…In traditional hate speech tasks, the most common algorithms for the task are SVM, Random Forests, and Decision Trees (Fortuna and Nunes, 2018). There are other studies on hate speech detections, based on Linguistic methods, n-grams, word2vec (Nobata et al, 2016), and logistic regression (Ousidhoum et al, 2019), Convolutional Neural Networks and Explainable Artificial Intelligence (Hardage and Najafirad, 2020), and Network Methods (Fan et al, 2020). With the advent of Transformers (Vaswani et al, 2017) structure and transfer learning in AI, BERT, which stands for Bidirectional Encoder Representations from Transformers (Devlin et al, 2018), becomes one of the most popular models for hate speech detection, and an increasing number of studies have shown the dominant performance of BERT in terms of detection tasks.…”
Section: Related Workmentioning
confidence: 99%