Arabic Offensive and Hate Speech Detection Using a Cross-Corpora Multi-Task Learning Model

Aldjanabi, Wassen; Dahou, Abdelghani; Al-qaness, Mohammed A. A.; Elaziz, Mohamed Abd; Helmi, Ahmed; Damaševičius, Robertas

doi:10.3390/informatics8040069

Cited by 50 publications

(27 citation statements)

References 60 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Hugely influenced by (Aldjanabi et al, 2021) work, we were able to explore many previous approaches to Arabic (HS) and (OFF) detection using (MTL). The first Arabic Religious (HS) Twitter dataset was collected by (Albadi et al, 2018).…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

AlexU-AIC at Arabic Hate Speech 2022: Contrast to Classify

Shapiro¹,

Khalafallah²,

Torki³

2022

Preprint

View full text Add to dashboard Cite

Online presence on social media platforms such as Facebook and Twitter has become a daily habit for internet users. Despite the vast amount of services the platforms offer for their users, users suffer from cyber-bullying, which further leads to mental abuse and may escalate to cause physical harm to individuals or targeted groups. In this paper, we present our submission to the Arabic Hate Speech 2022 Shared Task Workshop (OSACT5 2022) using the associated Arabic Twitter dataset. The shared task consists of 3 sub-tasks, sub-task A focuses on detecting whether the tweet is offensive or not. Then, For offensive Tweets, sub-task B focuses on detecting whether the tweet is hate speech or not. Finally, For hate speech Tweets, sub-task C focuses on detecting the fine-grained type of hate speech among six different classes. Transformer models proved their efficiency in classification tasks, but with the problem of over-fitting when fine-tuned on a small or an imbalanced dataset. We overcome this limitation by investigating multiple training paradigms such as Contrastive learning and Multi-task learning along with Classification fine-tuning and an ensemble of our top 5 performers. Our proposed solution achieved 0.841, 0.817, and 0.476 macro F1-average in sub-tasks A, B, and C respectively.

show abstract

Section: Related Workmentioning

confidence: 99%

“…Their proposed model achieved 0.904, 0.737 F1-score in the (OFF) and (HS) sub-tasks respectively. Moving from OSACT2020 submissions, (Aldjanabi et al, 2021) explores (MTL) more widely. They use dataset from OSACT2020 (HS) and (OFF), T-HSAB (Haddad et al, 2019), and (L-HSAB) (Mulki et al, 2019).…”

Section: Related Workmentioning

confidence: 99%

AlexU-AIC at Arabic Hate Speech 2022: Contrast to Classify

Shapiro¹,

Khalafallah²,

Torki³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…TC is a machine learning challenge that tries to classify new written content into a conceptual group from a predetermined classification collection [1]. It is crucial in a variety of applications, including sentiment analysis [2,3], spam email filtering [4,5], hate speech detection [6], text summarization [7], website classification [8], authorship attribution [9], information retrieval [10], medical diagnostics [11], emotion detection on smart phones [12], online recommendations [13], fake news detection [14,15], crypto-ransomware early detection [16], semantic similarity detection [17], part-of-speech tagging [18], news classification [19], and tweet classification [20].…”

Section: Introductionmentioning

confidence: 99%

A Novel Text Classification Technique Using Improved Particle Swarm Optimization: A Case Study of Arabic Language

et al. 2022

Self Cite

View full text Add to dashboard Cite

We propose a novel text classification model, which aims to improve the performance of Arabic text classification using machine learning techniques. One of the effective solutions in Arabic text classification is to find the suitable feature selection method with an optimal number of features alongside the classifier. Although several text classification methods have been proposed for the Arabic language using different techniques, such as feature selection methods, an ensemble of classifiers, and discriminative features, choosing the optimal method becomes an NP-hard problem considering the huge search space. Therefore, we propose a method, called Optimal Configuration Determination for Arabic text Classification (OCATC), which utilized the Particle Swarm Optimization (PSO) algorithm to find the optimal solution (configuration) from this space. The proposed OCATC method extracts and converts the features from the textual documents into a numerical vector using the Term Frequency-Inverse Document Frequency (TF–IDF) approach. Finally, the PSO selects the best architecture from a set of classifiers to feature selection methods with an optimal number of features. Extensive experiments were carried out to evaluate the performance of the OCATC method using six datasets, including five publicly available datasets and our proposed dataset. The results obtained demonstrate the superiority of OCATC over individual classifiers and other state-of-the-art methods.

show abstract

“…Detecting hate speech in Arabic, used by Islamicists, is even more challenging due to the lack of high-quality labeled datasets that can be used to train models to detect hatred content automatically [8]. This is because of the Arabic language's complex grammatical structure and extensive morphology.…”

Section: Introductionmentioning

confidence: 99%

Detecting Islamic Radicalism Arabic Tweets Using Natural Language Processing

et al. 2022

View full text Add to dashboard Cite

The image of the tolerant religion of Islam has been distorted by extremists in the last two decades in many ways, such as luring teenagers into terrorist acts. Nowadays, millions of users socialize and share ideas using social media platforms such as Twitter. Typically, the ideas shared on Twitter (tweets) reach and influence many people who could simply retweet them and make them even spread faster. Unfortunately, some of these ideas are posted by extremists who share hateful Arabic content. Thus, it is very important to automate the process of controlling and monitoring hateful Arabic tweets, given that Arabic is the most widely used language in the Islamic world. In this paper, we provide a manually labeled and curated dataset of 3,000 Arabic tweets that contain hateful and non-hateful tweets. To automate the process of detecting hateful tweets, we utilize advanced Machine Learning (ML) techniques and perform sentiment analysis to capture the meaning of the Arabic words in a proper word embedding. Also, we used the proposed model to classify and analyze 100,000 tweets of the last decade. The outcome of this work promotes future research on analyzing Arabic hateful speech by providing a manually labeled Arabic dataset, and the trained model (achieved 92% accuracy) which can be used as an underlying tool by governments, Internet service providers, and social media applications to detect any inflammatory tweets before they spread to a wider audience.

show abstract

Arabic Offensive and Hate Speech Detection Using a Cross-Corpora Multi-Task Learning Model

Cited by 50 publications

References 60 publications

AlexU-AIC at Arabic Hate Speech 2022: Contrast to Classify

AlexU-AIC at Arabic Hate Speech 2022: Contrast to Classify

A Novel Text Classification Technique Using Improved Particle Swarm Optimization: A Case Study of Arabic Language

Detecting Islamic Radicalism Arabic Tweets Using Natural Language Processing

Contact Info

Product

Resources

About