With the increasing number of online social posts, review comments, and digital documentations, the Arabic text classification (ATC) task has been hugely required for many spontaneous natural language processing (NLP) applications, especially within the coronavirus pandemics. The variations in the meaning of the same Arabic words could directly affect the performance of any AI-based framework. This work aims to identify the effectiveness of machine learning (ML) algorithms through preprocessing and representation techniques. This effectiveness is measured via different AI-based classification techniques. Basically, the ATC process is influenced by several factors such as stemming in preprocessing, method of feature extraction and selection, nature of datasets, and classification algorithm. To improve the overall classification performance, preprocessing techniques are mainly used to convert each Arabic word into its root and decrease the representation dimension among the datasets. Feature extraction and selection always play crucial roles to represent the Arabic text in a meaningful way and improve the classification accuracy rate. The selected classifiers in this study are performed based on various feature selection algorithms. The overall classification evaluation results are compared using different classifiers such as multinomial Naive Bayes (MNB), Bernoulli Naive Bayes (BNB), Stochastic Gradient Descent (SGD), Support Vector Classifier (SVC), Logistic Regression (LR), and Linear SVC. All of these AI classifiers are evaluated using five balanced and unbalanced benchmark datasets: BBC Arabic corpus, CNN Arabic corpus, Open-Source Arabic corpus (OSAc), ArCovidVac, and AlKhaleej. The evaluation results show that the classification performance strongly depends on the preprocessing technique, representation methods and classification technique, and the nature of datasets used. For the considered benchmark datasets, the linear SVC has outperformed other classifiers overall when prominent features are selected.
Social media networking is a prominent topic in real life, particularly at the current moment. The impact of comments has been investigated in several studies. Twitter, Facebook, and Instagram are just a few of the social media networks that are used to broadcast different news worldwide. In this paper, a comprehensive AI-based study is presented to automatically detect the Arabic text misogyny and sarcasm in binary and multiclass scenarios. The key of the proposed AI approach is to distinguish various topics of misogyny and sarcasm from Arabic tweets in social media networks. A comprehensive study is achieved for detecting both misogyny and sarcasm via adopting seven state-of-the-art NLP classifiers: ARABERT, PAC, LRC, RFC, LSVC, DTC, and KNNC. To fine tune, validate, and evaluate all of these techniques, two Arabic tweets datasets (i.e., misogyny and Abu Farah datasets) are used. For the experimental study, two scenarios are proposed for each case study (misogyny or sarcasm): binary and multiclass problems. For misogyny detection, the best accuracy is achieved using the AraBERT classifier with 91.0% for binary classification scenario and 89.0% for the multiclass scenario. For sarcasm detection, the best accuracy is achieved using the AraBERT as well with 88% for binary classification scenario and 77.0% for the multiclass scenario. The proposed method appears to be effective in detecting misogyny and sarcasm in social media platforms with suggesting AraBERT as a superior state-of-the-art deep learning classifier.
Drosophila melanogaster is an important genetic model organism used extensively in medical and biological studies. About 61% of known human genes have a recognizable match with the genetic code of Drosophila flies, and 50% of fly protein sequences have mammalian analogues. Recently, several investigations have been conducted in Drosophila to study the functions of specific genes exist in the central nervous system, heart, liver, and kidney. The outcomes of the research in Drosophila are also used as a unique tool to study human-related diseases. This article presents a novel automated system to classify the gender of Drosophila flies obtained through microscopic images (ventral view). The proposed system takes an image as input and converts it into grayscale illustration to extract the texture features from the image. Then, machine learning (ML) classifiers such as support vector machines (SVM), Naive Bayes (NB), and K -nearest neighbour (KNN) are used to classify the Drosophila as male or female. The proposed model is evaluated using the real microscopic image dataset, and the results show that the accuracy of the KNN is 90%, which is higher than the accuracy of the SVM classifier.
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).
The COVID-19 pandemic has been a global health problem since December 2019. To date, the total number of confirmed cases, recoveries, and deaths has exponentially increased on a daily basis worldwide. In this paper, a hybrid deep learning approach is proposed to directly classify the COVID-19 disease from both chest X-ray (CXR) and CT images. Two AI-based deep learning models, namely ResNet50 and EfficientNetB0, are adopted and trained using both chest X-ray and CT images. The public datasets, consisting of 7863 and 2613 chest X-ray and CT images, are respectively used to deploy, train, and evaluate the proposed deep learning models. The deep learning model of EfficientNetB0 consistently performed a better classification result, achieving overall diagnosis accuracies of 99.36% and 99.23% using CXR and CT images, respectively. For the hybrid AI-based model, the overall classification accuracy of 99.58% is achieved. The proposed hybrid deep learning system seems to be trustworthy and reliable for assisting health care systems, patients, and physicians.
AI-based text classification is a process to classify Arabic contents into their categories. With the increasing number of Arabic texts in our social life, traditional machine learning approaches are facing different challenges due to the complexity of the morphology and the delicate variation of the Arabic language. This work proposes a model to represent and recognize Arabic text at the character level based on the capability of a deep convolutional neural network (CNN). This system was validated using five-fold cross-validation tests for Arabic text document classification. We have used our proposed system to evaluate Arabic text. The ArCAR system shows its capability to classify Arabic text in character-level. For document classification, the ArCAR system achieves the best performance using the AlKhaleej-balance dataset in terms of accuracy equal to 97.76%. The proposed ArCAR seems to provide a practical solution for accurate Arabic text representation, both for understanding and as a classifications system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.