2022
DOI: 10.1145/3501398
|View full text |Cite
|
Sign up to set email alerts
|

Investigating the Effect of Preprocessing Arabic Text on Offensive Language and Hate Speech Detection

Abstract: Preprocessing of input text can play a key role in text classification by reducing dimensionality and removing unnecessary content. This study aims to investigate the impact of preprocessing on Arabic offensive language classification. We explore six preprocessing techniques: conversion of emojis to Arabic textual labels, normalization of different forms of Arabic letters, normalization of selected nouns from dialectal Arabic to Modern Standard Arabic, conversion of selected hyponyms to hypernyms, hashtag segm… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 17 publications
(8 citation statements)
references
References 12 publications
0
3
0
Order By: Relevance
“…This geographic variation, in addition to some other demographic and cultural factors (e.g., profession, economic stability), leads to differences within the Kuwaiti dialects, creating multiple Kuwaiti sub-dialects. In some cases, distinguishing one Kuwaiti sub-dialect from others can be easily captured through minor variations in its phonetics sounds, such as the word sugar in MSA " / Sukar", which is pronounced as " / Shikar" by Kuwaitis from Sharq and " / Shukar" by Kuwaitis from Qibla 2 . In more extreme situations, different words are used to refer to the same meaning based on the geographic origin, for example, the word " / sareer" is used by Kuwaitis from Qibla while the word " / kirfaia" is used by Kuwaitis from Sharq to refer to a bed in MSA " /sareer".…”
Section: A Kuwaiti Dialectmentioning
confidence: 99%
See 2 more Smart Citations
“…This geographic variation, in addition to some other demographic and cultural factors (e.g., profession, economic stability), leads to differences within the Kuwaiti dialects, creating multiple Kuwaiti sub-dialects. In some cases, distinguishing one Kuwaiti sub-dialect from others can be easily captured through minor variations in its phonetics sounds, such as the word sugar in MSA " / Sukar", which is pronounced as " / Shikar" by Kuwaitis from Sharq and " / Shukar" by Kuwaitis from Qibla 2 . In more extreme situations, different words are used to refer to the same meaning based on the geographic origin, for example, the word " / sareer" is used by Kuwaitis from Qibla while the word " / kirfaia" is used by Kuwaitis from Sharq to refer to a bed in MSA " /sareer".…”
Section: A Kuwaiti Dialectmentioning
confidence: 99%
“…Based on the findings from previous studies, when applying an advanced deep learning language model similar to BERT models in the classification system, there will be limited effects of preprocessing Arabic text on improving the performance of the system [2], [42]. Accordingly, we used the raw text directly and no extensive preprocessing techniques were applied, only the basic data cleaning and filtering techniques that were the corpus development phase were applied before using the posts to train and evaluate the classification models.…”
Section: ) Deep Learning Language Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…Text preprocessing merupakan proses yang dilakukan untuk memperoleh informasi yang berkualitas tinggi dari kumpulan teks [6] [7]. Adapun tahapan yang dilakukan pada proses ini dijelaskan pada Gambar 2 [8].…”
Section: B Text Preprocessingunclassified
“…Therefore, it would make sense to translate these emoticons into text so that we can continue working with them. In a study by F. Husain et al [24] they worked with a similar problem, they converted emoticons to text. Beautifulsoup4 version 4.8.22 was used to extract the emoticon description in English from Unicode.org.…”
Section: Next Direction Of the Workmentioning
confidence: 99%