2023
DOI: 10.1017/s1351324923000402
|View full text |Cite
|
Sign up to set email alerts
|

Emojis as anchors to detect Arabic offensive language and hate speech

Hamdy Mubarak,
Sabit Hassan,
Shammur Absar Chowdhury

Abstract: We introduce a generic, language-independent method to collect a large percentage of offensive and hate tweets regardless of their topics or genres. We harness the extralinguistic information embedded in the emojis to collect a large number of offensive tweets. We apply the proposed method on Arabic tweets and compare it with English tweets—analyzing key cultural differences. We observed a constant usage of these emojis to represent offensiveness throughout different timespans on Twitter. We manually annotate … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 14 publications
(7 citation statements)
references
References 42 publications
0
7
0
Order By: Relevance
“…e best performance for Arabic was achieved by STSL, with a macro F1 score of 0.84 and a micro F1 score of 0.72. In Mubarak et al (2022), the authors introduced an Arabic multi-dialectal dataset which consists of 12,698 tweets classi ed into two main classes: clean and offensive. e offensive class was further classi ed into eight sub-classes which are: gender, race, ideology, social class, religion, disability, vulgar, and violence.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…e best performance for Arabic was achieved by STSL, with a macro F1 score of 0.84 and a micro F1 score of 0.72. In Mubarak et al (2022), the authors introduced an Arabic multi-dialectal dataset which consists of 12,698 tweets classi ed into two main classes: clean and offensive. e offensive class was further classi ed into eight sub-classes which are: gender, race, ideology, social class, religion, disability, vulgar, and violence.…”
Section: Related Workmentioning
confidence: 99%
“…For example, Haddad et al ( 2019 ) introduced a corpus for the Tunisian dialect and Mulki et al ( 2019 ) proposed a corpus specifically for the Levantine dialect. Some other corpora include Arabic text from mixed dialects such as Albadi et al ( 2018 ), Omar et al ( 2020 ), Duwairi et al ( 2021 ), and Mubarak et al ( 2022 ). However, they lack annotation specifying the dialect of each sentence and they also lack balancing between the different dialects.…”
Section: Introductionmentioning
confidence: 99%
“…Althobaiti [15] proposed an automatic method for detecting offensive language and precise hate speech in Arabic tweets. They used a dataset [16] with 12,698 tweets classified into 8235 clean and 4463 offensive tweets. They also investigated the use of sentiment analysis and emoji descriptions as appending features along with the textual content of the tweets.…”
Section: Related Workmentioning
confidence: 99%
“…To better understand the level of offensiveness in the content moderation task, we manually annotated 1, 238 comments (around 200 removed and 100 unremoved examples for each of the four languages) using the fine-grained taxonomy of offensiveness presented in Mubarak et al (2022), with the addition of the categories of sexuality and age. The distribution of comments for different categories is shown in Figure 3.…”
Section: Manual Analysis Of Offensivenessmentioning
confidence: 99%