In recent years, convolutional neural network (CNN) has attracted considerable attention since its impressive performance in various applications, such as Arabic sentence classification. However, building a powerful CNN for Arabic sentiment classification can be highly complicated and time consuming. In this paper, we address this problem by combining differential evolution (DE) algorithm and CNN, where DE algorithm is used to automatically search the optimal configuration including CNN architecture and network parameters. In order to achieve the goal, five CNN parameters are searched by the DE algorithm which include convolution filter sizes that control the CNN architecture, number of filters per convolution filter size (NFCS), number of neurons in fully connected (FC) layer, initialization mode, and dropout rate. In addition, the effect of the mutation and crossover operators in DE algorithm were investigated. The performance of the proposed framework DE-CNN is evaluated on five Arabic sentiment datasets. Experiments’ results show that DE-CNN has higher accuracy and is less time consuming than the state-of-the-art algorithms.
Developing cyber security is very necessary and has attracted considerable attention from academy and industry organizations worldwide. It is also very necessary to provide sustainable computing for the the Internet of Things (IoT). Machine learning techniques play a vital role in the cybersecurity of the IoT for intrusion detection and malicious identification. Thus, in this study, we develop new feature extraction and selection methods and for the IDS system using the advantages of the swarm intelligence (SI) algorithms. We design a feature extraction mechanism depending on the conventional neural networks (CNN). After that, we present an alternative feature selection (FS) approach using the recently developed SI algorithm, Aquila optimizer (AQU). Moreover, to assess the quality of the developed IDS approach, four well-known public datasets, CIC2017, NSL-KDD, BoT-IoT, and KDD99, were used. We also considered extensive comparisons to other optimization methods to verify the competitive performance of the developed method. The results show the high performance of the developed approach using different evaluation indicators.
Stemming is one of the most effective techniques, which has been adopted in many applications, such as machine learning, machine translation, document classification (DC), information retrieval, and natural language processing. The stemming technique is meant to be applied during the classification of documents to reduce the high dimensionality of the feature space, which, in turn, raises the functioning of the classification system, particularly with extreme modulated language, for instance, Arabic language. This paper aims to study the impact of stemming techniques, namely Information Science Research Institute (ISRI), Tashaphyne, and ARLStem on Arabic DC. The classification algorithms, namely Naïve Bayesian (NB), support vector machine (SVM), and K-nearest neighbors (KNN), are used in this paper. In addition, the chi-square feature selection is used to select the most relevant features. Experiments are conducted on CNN Arabic corpus, which is collected from Arabic websites to assess the performance of the classification system. In order to evaluate the classifiers, the K-fold cross-validation method and Micro-F1 are used. Findings of this paper indicate that the ARLStem outperforms the ISRI and Tashaphyne stemmers. The outcomes clearly showed the effectiveness of the SVM over the KNN and NB classifiers, which achieved 94.64% Micro-F1 value when using the ARLStem stemmer. INDEX TERMS Arabic text classification, text preprocessing, stemming techniques, feature extraction, feature selection.
This article studies convolutional neural networks for Tigrinya (also referred to as Tigrigna), which is a family of Semitic languages spoken in Eritrea and northern Ethiopia. Tigrinya is a “low-resource” language and is notable in terms of the absence of comprehensive and free data. Furthermore, it is characterized as one of the most semantically and syntactically complex languages in the world, similar to other Semitic languages. To the best of our knowledge, no previous research has been conducted on the state-of-the-art embedding technique that is shown here. We investigate which word representation methods perform better in terms of learning for single-label text classification problems, which are common when dealing with morphologically rich and complex languages. Manually annotated datasets are used here, where one contains 30,000 Tigrinya news texts from various sources with six categories of “sport”, “agriculture”, “politics”, “religion”, “education”, and “health” and one unannotated corpus that contains more than six million words. In this paper, we explore pretrained word embedding architectures using various convolutional neural networks (CNNs) to predict class labels. We construct a CNN with a continuous bag-of-words (CBOW) method, a CNN with a skip-gram method, and CNNs with and without word2vec and FastText to evaluate Tigrinya news articles. We also compare the CNN results with traditional machine learning models and evaluate the results in terms of the accuracy, precision, recall, and F1 scoring techniques. The CBOW CNN with word2vec achieves the best accuracy with 93.41%, significantly improving the accuracy for Tigrinya news classification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.