2021
DOI: 10.1016/j.heliyon.2021.e06191
|View full text |Cite
|
Sign up to set email alerts
|

Preprocessing Arabic text on social media

Abstract: Currently, social media plays an important role in daily life and routine. Millions of people use social media for different purposes. Large amounts of data flow through online networks every second, and these data contain valuable information that can be extracted if the data are properly processed and analyzed. However, most of the processing results are affected by preprocessing difficulties. This paper presents an approach to extract information from social media Arabic text. It provides an integrated solu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 52 publications
(28 citation statements)
references
References 26 publications
0
18
0
1
Order By: Relevance
“…Arabic text has several characteristics that make it more challenging for NLP systems such as diacritics and mixture of dialect, modern, and classical texts. These challenges can be observed obviously in social media texts [46]. Preprocessing the text before feeding it to machine learning algorithms is important and can improve the accuracy of these models sharply [47].…”
Section: Preprocessingmentioning
confidence: 99%
“…Arabic text has several characteristics that make it more challenging for NLP systems such as diacritics and mixture of dialect, modern, and classical texts. These challenges can be observed obviously in social media texts [46]. Preprocessing the text before feeding it to machine learning algorithms is important and can improve the accuracy of these models sharply [47].…”
Section: Preprocessingmentioning
confidence: 99%
“…Arabic is a Semitic language that is strongly tied to Islam and Muslim culture, and it is the language of the Quran, used by all Muslims (over 1.62 billion people) [23]. It is also the mother tongue of over 422 million people [24]. There are 28 alphabets in this language, and lines are expressed from right to left [25].…”
Section: Arabic Languagementioning
confidence: 99%
“…The data obtained via the internet is unstructured and must be preprocessed before being used in later stages [24], [27]. A great deal of work is required before preprocessing Arabic content on social media, since most of it will be informal (not standard) and may include dialects, misspellings, characters with diacritical marks, and elongations [12].…”
Section: Arabic Text Preprocessingmentioning
confidence: 99%
“…Finally, the task of Arabic preprocessing has received attention in previous work [37][38] [18] including an approach presented by the authors [25]. Here we create two Sudanese Arabic sentiment datasets, one 2-Class and one 3-Class.…”
Section: Related Workmentioning
confidence: 99%