Arabic text classification: the need for multi-labeling systems

Rifai, Hozayfa El; Qadi, Leen Al; Elnagar, Ashraf

doi:10.1007/s00521-021-06390-z

Cited by 39 publications

(23 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…During training, nonlinear mapping is learned by utilising nonlinear activation functions and numerous layers. Nonlinear activation is used to generate the label in MLP [ 22 ]. Support vector machine.…”

Section: Proposed Methodologymentioning

confidence: 99%

“…One of the most common classification and regression algorithms is the Support Vector Machine (SVM), a supervised technique for classification and regression issues. To locate the decision border between two classes, it uses a vector space model that is as far away from the data points as possible, and the support vectors are data points near the hyperplane that divides classes [ 22 , 23 ]. XGBoost classifier.…”

Section: Proposed Methodologymentioning

confidence: 99%

See 1 more Smart Citation

Analysing Hate Speech against Migrants and Women through Tweets Using Ensembled Deep Learning Model

Hasan

Sharma²,

Khan

et al. 2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

Twitter’s popularity has exploded in the previous few years, making it one of the most widely used social media sites. As a result of this development, the strategies described in this study are now more beneficial. Additionally, there has been an increase in the number of people who express their views in demeaning ways to others. As a result, hate speech has piqued interest in the subject of sentiment analysis, which has developed various algorithms for detecting emotions in social networks using intuitive means. This paper proposes the deep learning model to classify the sentiments in two separate analyses. In the first analysis, the tweets are classified based on the hate speech against the migrants and the women. In the second analysis, the detection is performed using a deep learning model to organise whether the hate speech is performed by a single or a group of users. During the text analysis, word embedding is implemented using the combination of deep learning models such as BiLSTM, CNN, and MLP. These models are integrated with word embedding methods such as inverse glove (global vector), document frequency (TF-IDF), and transformer-based embedding.

show abstract

Section: Proposed Methodologymentioning

confidence: 99%

Section: Proposed Methodologymentioning

confidence: 99%

Analysing Hate Speech against Migrants and Women through Tweets Using Ensembled Deep Learning Model

Hasan

Sharma²,

Khan

et al. 2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

show abstract

“…GRU uses two gates: an update gate and a reset gate. The reset gate determines the amount of past information to be forgotten, while the update gate determines which information to keep and not to keep [ 17 ].…”

Section: Methodsmentioning

confidence: 99%

A Novel Preoperative Prediction Model Based on Deep Learning to Predict Neoplasm T Staging and Grading in Patients with Upper Tract Urothelial Carcinoma

Gao

Ying

et al. 2022

JCM

View full text Add to dashboard Cite

Objectives: To create a novel preoperative prediction model based on a deep learning algorithm to predict neoplasm T staging and grading in patients with upper tract urothelial carcinoma (UTUC). Methods: We performed a retrospective cohort study of patients diagnosed with UTUC between 2001 and 2012 at our institution. Five deep learning algorithms (CGRU, BiGRU, CNN-BiGRU, CBiLSTM, and CNN-BiLSTM) were used to develop a preoperative prediction model for neoplasm T staging and grading. The Matthews correlation coefficient (MMC) and the receiver-operating characteristic curve with the area under the curve (AUC) were used to evaluate the performance of each prediction model. Results: The clinical data of a total of 884 patients with pathologically confirmed UTUC were collected. The T-staging prediction model based on CNN-BiGRU achieved the best performance, and the MMC and AUC were 0.598 (0.592–0.604) and 0.760 (0.755–0.765), respectively. The grading prediction model [1973 World Health Organization (WHO) grading system] based on CNN-BiGRU achieved the best performance, and the MMC and AUC were 0.612 (0.609–0.615) and 0.804 (0.801–0.807), respectively. The grading prediction model [2004 WHO grading system] based on BiGRU achieved the best performance, and the MMC and AUC were 0.621 (0.616–0.626) and 0.824 (0.819–0.829), respectively. Conclusions: We developed an accurate UTUC preoperative prediction model to predict neoplasm T staging and grading based on deep learning algorithms, which will help urologists to make appropriate treatment decisions in the early stage.

show abstract

“…According to the above references, most of the work focused on the task of binary classification, i.e., they are labeled as (bullying/non-bullying) or (offensive/non-offensive) [18,22,23,25]. Nonetheless, multi-class integration is becoming an increasingly important [28]. That is because it is not used for a specific classification.…”

Section: Related Workmentioning

confidence: 99%

“…In addition, we applied word tokenization and stemming. We utilized Tf-Idf to extract the text data's features, and then we implemented some of the most common classical classifiers [28,[36][37][38]. For the classical classifiers approach, we split the dataset into 80% training and 20% testing.…”

Section: Benchmark Evaluationmentioning

confidence: 99%

Instagram-Based Benchmark Dataset for Cyberbullying Detection in Arabic Text

Bayari

Abdallah

2022

Data

View full text Add to dashboard Cite

(1) Background: the ability to use social media to communicate without revealing one’s real identity has created an attractive setting for cyberbullying. Several studies targeted social media to collect their datasets with the aim of automatically detecting offensive language. However, the majority of the datasets were in English, not in Arabic. Even the few Arabic datasets that were collected, none focused on Instagram despite being a major social media platform in the Arab world. (2) Methods: we use the official Instagram APIs to collect our dataset. To consider the dataset as a benchmark, we use SPSS (Kappa statistic) to evaluate the inter-annotator agreement (IAA), as well as examine and evaluate the performance of various learning models (LR, SVM, RFC, and MNB). (3) Results: in this research, we present the first Instagram Arabic corpus (sub-class categorization (multi-class)) focusing on cyberbullying. The dataset is primarily designed for the purpose of detecting offensive language in texts. We end up with 200,000 comments, of which 46,898 comments were annotated by three human annotators. The results show that the SVM classifier outperforms the other classifiers, with an F1 score of 69% for bullying comments and 85 percent for positive comments.

show abstract

Arabic text classification: the need for multi-labeling systems

Cited by 39 publications

References 33 publications

Analysing Hate Speech against Migrants and Women through Tweets Using Ensembled Deep Learning Model

Analysing Hate Speech against Migrants and Women through Tweets Using Ensembled Deep Learning Model

A Novel Preoperative Prediction Model Based on Deep Learning to Predict Neoplasm T Staging and Grading in Patients with Upper Tract Urothelial Carcinoma

Instagram-Based Benchmark Dataset for Cyberbullying Detection in Arabic Text

Contact Info

Product

Resources

About