SalamNET at SemEval-2020 Task 12: Deep Learning Approach for Arabic Offensive Language Detection

Husain, Fatemah; Lee, Joo-Yeon; Henry, Sam; Uzuner, Özlem

doi:10.18653/v1/2020.semeval-1.283

Cited by 11 publications

(7 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On the other hand, deep learning techniques eliminate the need for handcrafted features. Deep learning has gained significant popularity for HS identification in Arabic Twitter data since 2017 ( Badjatiya et al, 2017 ), primarily due to its capacity to research classification appropriate to data representations ( Husain, 2020 ; Mansur, Omar & Tiun, 2023 ). Well-known deep learning techniques include CNNs and LSTM networks ( Duwairi, Hayajneh & Quwaider, 2021 ).…”

Section: Review Findings and Discussionmentioning

confidence: 99%

A systematic literature review of hate speech identification on Arabic Twitter data: research challenges and future directions

Alhazmi,

Mahmud,

Idris

et al. 2024

PeerJ Computer Science

View full text Add to dashboard Cite

The automatic speech identification in Arabic tweets has generated substantial attention among academics in the fields of text mining and natural language processing (NLP). The quantity of studies done on this subject has experienced significant growth. This study aims to provide an overview of this field by conducting a systematic review of literature that focuses on automatic hate speech identification, particularly in the Arabic language. The goal is to examine the research trends in Arabic hate speech identification and offer guidance to researchers by highlighting the most significant studies published between 2018 and 2023. This systematic study addresses five specific research questions concerning the types of the Arabic language used, hate speech categories, classification techniques, feature engineering techniques, performance metrics, validation methods, existing challenges faced by researchers, and potential future research directions. Through a comprehensive search across nine academic databases, 24 studies that met the predefined inclusion criteria and quality assessment were identified. The review findings revealed the existence of many Arabic linguistic varieties used in hate speech on Twitter, with modern standard Arabic (MSA) being the most prominent. In identification techniques, machine learning categories are the most used technique for Arabic hate speech identification. The result also shows different feature engineering techniques used and indicates that N-gram and CBOW are the most used techniques. F1-score, precision, recall, and accuracy were also identified as the most used performance metric. The review also shows that the most used validation method is the train/test split method. Therefore, the findings of this study can serve as valuable guidance for researchers in enhancing the efficacy of their models in future investigations. Besides, algorithm development, policy rule regulation, community management, and legal and ethical consideration are other real-world applications that can be reaped from this research.

show abstract

Section: Review Findings and Discussionmentioning

confidence: 99%

A systematic literature review of hate speech identification on Arabic Twitter data: research challenges and future directions

Alhazmi,

Mahmud,

Idris

et al. 2024

PeerJ Computer Science

View full text Add to dashboard Cite

show abstract

“…A dataset consisting of 5% hate speech was presented at OSACT 2020 shared task. The best system performed extensive preprocessing including normalizing emojis (translate their English description to Arabic) and dialectal Arabic (DA) to modern standard Arabic (MSA) conversion among others (Husain 2020). ASAD (Hassan et al 2021a) is an online tool that utilizes the shared task datasets for offensiveness and hate speech detection in tweets along with other social media analysis components such as emotion (Hassan et al 2021b) and spam detection (Mubarak et al 2020a).…”

Section: Related Workmentioning

confidence: 99%

Emojis as anchors to detect Arabic offensive language and hate speech

Mubarak,

Hassan,

Chowdhury

2023

Nat. Lang. Eng.

View full text Add to dashboard Cite

We introduce a generic, language-independent method to collect a large percentage of offensive and hate tweets regardless of their topics or genres. We harness the extralinguistic information embedded in the emojis to collect a large number of offensive tweets. We apply the proposed method on Arabic tweets and compare it with English tweets—analyzing key cultural differences. We observed a constant usage of these emojis to represent offensiveness throughout different timespans on Twitter. We manually annotate and publicly release the largest Arabic dataset for offensive, fine-grained hate speech, vulgar, and violence content. Furthermore, we benchmark the dataset for detecting offensiveness and hate speech using different transformer architectures and perform in-depth linguistic analysis. We evaluate our models on external datasets—a Twitter dataset collected using a completely different method, and a multi-platform dataset containing comments from Twitter, YouTube, and Facebook, for assessing generalization capability. Competitive results on these datasets suggest that the data collected using our method capture universal characteristics of offensive language. Our findings also highlight the common words used in offensive communications, common targets for hate speech, specific patterns in violence tweets, and pinpoint common classification errors that can be attributed to limitations of NLP models. We observe that even state-of-the-art transformer models may fail to take into account culture, background, and context or understand nuances present in real-world data such as sarcasm.

show abstract

“…Using (2), IDF determines if a term (t) is frequent or rare in all documents (n) in order to know its importance. The document frequency (d) is the number of documents (d) that include the term (t) [26], [27].…”

Section: Feature Extractionmentioning

confidence: 99%

“…There are 6,839 tweets in the training dataset, including 1,371 offensive tweets and 350 hate speech tweets. There are 1,000 tweets in the development dataset, including 179 offensive tweets and 44 hate speech tweets [16], [27].…”

Section: Arabic Hate Speech Datasetmentioning

confidence: 99%

A hybrid approach based on personality traits for hate speech detection in Arabic social media

Elzayady

Mohamed

Badran

et al. 2023

IJECE

View full text Add to dashboard Cite

<p>In recent years, as social media has grown in popularity, people have gained the ability to freely share their views. However, this may lead to users' conflict and hostility, resulting in unattractive online environments. Hate speech relates to using expressions or phrases that are violent, offensive, or insulting to a minority of people. The number of Arab social media users is quickly rising, and this is being followed by an increase in the frequency of cyber hate speech in the area. Therefore, the automated detection of Arabic hate speech has become a major concern for many stakeholders. The intersection of personality learning and hate speech detection is a relatively less studied niche. We suggest a novel approach that is focused on extracting personality trait features and using these features to detect Arabic hate speech. The experimental results show that the proposed approach is superior in terms of the macro-F1 score by achieving 82.3% compared to previous work reported in the literature.</p>

show abstract

SalamNET at SemEval-2020 Task 12: Deep Learning Approach for Arabic Offensive Language Detection

Cited by 11 publications

References 18 publications

A systematic literature review of hate speech identification on Arabic Twitter data: research challenges and future directions

A systematic literature review of hate speech identification on Arabic Twitter data: research challenges and future directions

Emojis as anchors to detect Arabic offensive language and hate speech

A hybrid approach based on personality traits for hate speech detection in Arabic social media

Contact Info

Product

Resources

About