2021
DOI: 10.1007/s00521-021-06390-z
|View full text |Cite
|
Sign up to set email alerts
|

Arabic text classification: the need for multi-labeling systems

Abstract: The process of tagging a given text or document with suitable labels is known as text categorization or classification. The aim of this work is to automatically tag a news article based on its vocabulary features. To accomplish this objective, 2 large datasets have been constructed from various Arabic news portals. The first dataset contains of 90k single-labeled articles from 4 domains (Business, Middle East, Technology and Sports). The second dataset has over 290 k multi-tagged articles. To examine the singl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
23
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 39 publications
(23 citation statements)
references
References 33 publications
0
23
0
Order By: Relevance
“…During training, nonlinear mapping is learned by utilising nonlinear activation functions and numerous layers. Nonlinear activation is used to generate the label in MLP [ 22 ]. Support vector machine.…”
Section: Proposed Methodologymentioning
confidence: 99%
See 1 more Smart Citation
“…During training, nonlinear mapping is learned by utilising nonlinear activation functions and numerous layers. Nonlinear activation is used to generate the label in MLP [ 22 ]. Support vector machine.…”
Section: Proposed Methodologymentioning
confidence: 99%
“…One of the most common classification and regression algorithms is the Support Vector Machine (SVM), a supervised technique for classification and regression issues. To locate the decision border between two classes, it uses a vector space model that is as far away from the data points as possible, and the support vectors are data points near the hyperplane that divides classes [ 22 , 23 ]. XGBoost classifier.…”
Section: Proposed Methodologymentioning
confidence: 99%
“…GRU uses two gates: an update gate and a reset gate. The reset gate determines the amount of past information to be forgotten, while the update gate determines which information to keep and not to keep [ 17 ].…”
Section: Methodsmentioning
confidence: 99%
“…According to the above references, most of the work focused on the task of binary classification, i.e., they are labeled as (bullying/non-bullying) or (offensive/non-offensive) [18,22,23,25]. Nonetheless, multi-class integration is becoming an increasingly important [28]. That is because it is not used for a specific classification.…”
Section: Related Workmentioning
confidence: 99%
“…In addition, we applied word tokenization and stemming. We utilized Tf-Idf to extract the text data's features, and then we implemented some of the most common classical classifiers [28,[36][37][38]. For the classical classifiers approach, we split the dataset into 80% training and 20% testing.…”
Section: Benchmark Evaluationmentioning
confidence: 99%