Emojis as anchors to detect Arabic offensive language and hate speech

Mubarak, Hamdy; Hassan, Sabit; Chowdhury, Shammur Absar

doi:10.1017/s1351324923000402

Cited by 14 publications

(7 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…e best performance for Arabic was achieved by STSL, with a macro F1 score of 0.84 and a micro F1 score of 0.72. In Mubarak et al (2022), the authors introduced an Arabic multi-dialectal dataset which consists of 12,698 tweets classi ed into two main classes: clean and offensive. e offensive class was further classi ed into eight sub-classes which are: gender, race, ideology, social class, religion, disability, vulgar, and violence.…”

Section: Related Workmentioning

confidence: 99%

“…For example, Haddad et al ( 2019 ) introduced a corpus for the Tunisian dialect and Mulki et al ( 2019 ) proposed a corpus specifically for the Levantine dialect. Some other corpora include Arabic text from mixed dialects such as Albadi et al ( 2018 ), Omar et al ( 2020 ), Duwairi et al ( 2021 ), and Mubarak et al ( 2022 ). However, they lack annotation specifying the dialect of each sentence and they also lack balancing between the different dialects.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Hate speech detection with ADHAR: a multi-dialectal hate speech corpus in Arabic

Charfi,

Besghaier,

Akasheh

et al. 2024

Front. Artif. Intell.

View full text Add to dashboard Cite

Hate speech detection in Arabic poses a complex challenge due to the dialectal diversity across the Arab world. Most existing hate speech datasets for Arabic cover only one dialect or one hate speech category. They also lack balance across dialects, topics, and hate/non-hate classes. In this paper, we address this gap by presenting ADHAR—a comprehensive multi-dialect, multi-category hate speech corpus for Arabic. ADHAR contains 70,369 words and spans four language variants: Modern Standard Arabic (MSA), Egyptian, Levantine, Gulf and Maghrebi. It covers four key hate speech categories: nationality, religion, ethnicity, and race. A major contribution is that ADHAR is carefully curated to maintain balance across dialects, categories, and hate/non-hate classes to enable unbiased dataset evaluation. We describe the systematic data collection methodology, followed by a rigorous annotation process involving multiple annotators per dialect. Extensive qualitative and quantitative analyses demonstrate the quality and usefulness of ADHAR. Our experiments with various classical and deep learning models demonstrate that our dataset enables the development of robust hate speech classifiers for Arabic, achieving accuracy and F1-scores of up to 90% for hate speech detection and up to 92% for category detection. When trained with Arabert, we achieved an accuracy and F1-score of 94% for hate speech detection, as well as 95% for the category detection.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Hate speech detection with ADHAR: a multi-dialectal hate speech corpus in Arabic

Charfi,

Besghaier,

Akasheh

et al. 2024

Front. Artif. Intell.

View full text Add to dashboard Cite

show abstract

“…Althobaiti [15] proposed an automatic method for detecting offensive language and precise hate speech in Arabic tweets. They used a dataset [16] with 12,698 tweets classified into 8235 clean and 4463 offensive tweets. They also investigated the use of sentiment analysis and emoji descriptions as appending features along with the textual content of the tweets.…”

Section: Related Workmentioning

confidence: 99%

Arabic Toxic Tweet Classification: Leveraging the AraBERT Model

Koshiry,

Eliwa,

Abd El-Hafeez

et al. 2023

BDCC

View full text Add to dashboard Cite

Social media platforms have become the primary means of communication and information sharing, facilitating interactive exchanges among users. Unfortunately, these platforms also witness the dissemination of inappropriate and toxic content, including hate speech and insults. While significant efforts have been made to classify toxic content in the English language, the same level of attention has not been given to Arabic texts. This study addresses this gap by constructing a standardized Arabic dataset specifically designed for toxic tweet classification. The dataset is annotated automatically using Google’s Perspective API and the expertise of three native Arabic speakers and linguists. To evaluate the performance of different models, we conduct a series of experiments using seven models: long short-term memory (LSTM), bidirectional LSTM, a convolutional neural network, a gated recurrent unit (GRU), bidirectional GRU, multilingual bidirectional encoder representations from transformers, and AraBERT. Additionally, we employ word embedding techniques. Our experimental findings demonstrate that the fine-tuned AraBERT model surpasses the performance of other models, achieving an impressive accuracy of 0.9960. Notably, this accuracy value outperforms similar approaches reported in recent literature. This study represents a significant advancement in Arabic toxic tweet classification, shedding light on the importance of addressing toxicity in social media platforms while considering diverse languages and cultures.

show abstract

“…To better understand the level of offensiveness in the content moderation task, we manually annotated 1, 238 comments (around 200 removed and 100 unremoved examples for each of the four languages) using the fine-grained taxonomy of offensiveness presented in Mubarak et al (2022), with the addition of the categories of sexuality and age. The distribution of comments for different categories is shown in Figure 3.…”

Section: Manual Analysis Of Offensivenessmentioning

confidence: 99%

Multilingual Content Moderation: A Case Study on Reddit

Ye,

Sikka,

Atwell

et al. 2023

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

View full text Add to dashboard Cite

Content moderation is the process of flagging content based on pre-defined platform rules. There has been a growing need for AI moderators to safeguard users as well as protect the mental health of human moderators from traumatic content. While prior works have focused on identifying hateful/offensive language, they are not adequate for meeting the challenges of content moderation since 1) moderation decisions are based on violation of rules, which subsumes detection of offensive speech, and 2) such rules often differ across communities which entails an adaptive solution. We propose to study the challenges of content moderation by introducing a multilingual dataset of 1.8 Million Reddit comments spanning 56 subreddits in English, German, Spanish and French 1 . We perform extensive experimental analysis to highlight the underlying challenges and suggest related research problems such as cross-lingual transfer, learning under label noise (human biases), transfer of moderation models, and predicting the violated rule. Our dataset and analysis can help better prepare for the challenges and opportunities of auto moderation.

show abstract

Emojis as anchors to detect Arabic offensive language and hate speech

Cited by 14 publications

References 42 publications

Hate speech detection with ADHAR: a multi-dialectal hate speech corpus in Arabic

Hate speech detection with ADHAR: a multi-dialectal hate speech corpus in Arabic

Arabic Toxic Tweet Classification: Leveraging the AraBERT Model

Multilingual Content Moderation: A Case Study on Reddit

Contact Info

Product

Resources

About