Arabic natural language processing (ANLP) consists of developing techniques and tools that can utilize and analyze the Arabic language in both written and spoken contexts. ANLP makes an important contribution to many existing developed systems. It provides Arabic and non-Arabic speakers with helpful and convenient tools that can be used in different domains. Modern ANLP tools are developed using machine learning (ML) techniques. ML algorithms are widely used in NLP because of their high accuracy rate regardless of the robustness of the data that is used and because of the ease with which they can be implemented. On the other hand, the methodology of ANLP applications based on ML involves several distinct phases. It is, therefore, crucial to recognize and understand these phases in detail as well as the most widely used ML algorithms. This survey discusses this concept in detail, shows the involvement of ML techniques in developing such tools, and identifies well-known techniques used in ANLP. Moreover, this survey discusses the characteristics and complexity of the Arabic language in addition to the importance and needs of ANLP.INDEX TERMS Arabic natural language processing, classification, feature selection, machine learning.
Diabetes is one of the most common diseases worldwide. Many Machine Learning (ML) techniques have been utilized in predicting diabetes in the last couple of years. The increasing complexity of this problem has inspired researchers to explore the robust set of Deep Learning (DL) algorithms. The highest accuracy achieved so far was 95.1% by a combined model CNN-LSTM. Even though numerous ML algorithms were used in solving this problem, there are a set of classifiers that are rarely used or even not used at all in this problem, so it is of interest to determine the performance of these classifiers in predicting diabetes. Moreover, there is no recent survey that has reviewed and compared the performance of all the proposed ML and DL techniques in addition to combined models. This article surveyed all the ML and DL techniques-based diabetes predictions published in the last six years. In addition, one study was developed that aimed to implement those rarely and not used ML classifiers on the Pima Indian Dataset to analyze their performance. The classifiers obtained an accuracy of 68%–74%. The recommendation is to use these classifiers in diabetes prediction and enhance them by developing combined models.
Exploratory Projection Pursuit (EPP) methods have been developed thirty years ago in the context of exploratory analysis of large data sets. These methods consist in looking for low-dimensional projections that reveal some interesting structure existing in the data set but not visible in high dimension. Each projection is associated with a real valued index which optima correspond to valuable projections. Several EPP indices have been proposed in the statistics literature but the main problem lies in their optimization. In the present paper, we propose to apply Genetic Algorithms (GA) and recent Particle Swarm Optimization (PSO) algorithm to the optimization of several projection pursuit indices. We explain how the EPP methods can be implemented in order to become an efficient and powerful tool for the statistician. We illustrate our proposal on several simulated and real data sets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.