There is a vast amount of unstructured Arabic information on the Web, this data is always organized in semi-structured text and cannot be used directly. This research proposes a semi-supervised technique that extracts binary relations between two Arabic named entities from the Web. Several works have been performed for relation extraction from Latin texts and as far as we know, there isn't any work for Arabic text using a semi-supervised technique. The goal of this research is to extract a large list or table from named entities and relations in a specific domain. A small set of a handful of instance relations are required as input from the user. The system exploits summaries from Google search engine as a source text. These instances are used to extract patterns. The output is a set of new entities and their relations. The results from four experiments show that precision and recall varies
During the past years, the increase in scientific knowledge and the massive data production have caused an exponential growth in databases and repositories. Biomedical domain represents one of the rich data domains. An extensive amount of biomedical data is currently available, ranging from details of clinical symptoms to various types of biochemical data and outputs of imaging devices. Manually extracting biomedical patterns from data and transforming them into machine-understandable knowledge is a difficult task because biomedical domain comprises huge, dynamic, and complicated knowledge. Data mining is capable of improving the quality of extracting biomedical patterns.In this research, an overview of the applications of data mining on the management of diseases is presented. The main focus is to investigate machine learning techniques (MLT) which are widely used to predict, prognose and treat important frequent diseases such as cancers, hepatitis and heart diseases. The techniques namely Artificial Neural Network, K-Nearest Neighbour, Decision Tree, and Associative Classification are illustrated and analyzed. This survey provides a general analysis of the current status of management of diseases using MLT. The achieved accuracy of the various applications ranged from 70% to 100% according to the disease, the solved problem, and the used data and technique.
Early detection of cancer can increase patients' survivability and treatment options. Medical images such as Mammogram, Ultrasound, Magnetic Resonance Imaging, and microscopic images are the common method for cancer diagnosis. Recently, computer-aided diagnosis (CAD) systems have been used to help physicians in cancer diagnosis so that the diagnosis accuracy can be improved. CAD can help in decreasing missed cancer lesions due to physician fatigue, reducing the burden of workload and data overloading, and decreasing variability of inter- and intra-readers of images. In this research, a framework of CAD systems for cancer diagnosis based on medical images has been proposed. The proposed work helps physicians in detection of suspicion regions using different medical images modalities and in classifying the detected suspicious regions as normal or abnormal with the highest possible accuracy. The proposed framework of CAD system consists of four stages which are: preprocessing, segmentation of regions of interest, feature extraction and selection, and finally classification. In this research, the framework has been applied on blood smear images to diagnose the cases as normal or abnormal for Acute Lymphoblastic Leukemia (ALL) cases. Ant Colony Optimization (ACO) has been used to select the subsets of features from the features extracted from segmented cell parts which can maximize the classification performance as possible. Different classifiers which are Decision Tree (DT), K-nearest neighbor (K-NN), Naïve Bayes (NB), and Support Vector Machine (SVM) have been applied. The framework has been yielding promising results which reached 96.25% accuracy, 97.3% sensitivity, and 95.35% specificity using decision tree classifier.
Currently, expressing feelings through social media requires great consideration as an essential part of our lives; besides sharing ideas and thoughts, we share moments and good memories. Social media such as Facebook, Twitter, Weibo, and LinkedIn, are considered rich sources of opinionated text data. Both organizations and individuals are interested in using social media to analyze people's opinions and extract sentiments and emotions. Most studies on social media analysis mainly classified sentiment as positive, negative, or neutral classes. The challenge in emotion analysis arises because humans can express one or several emotions within one expression. Human beings can recognize these different emotions well; however, it is still not easy for an emotion analysis system. In most cases, the Arabic language used through social media is of a slangy or colloquial form, making it more challenging to preprocess and filter noise since most lemmatization and stemming tools are built on Modern Standard Arabic (MSA). An emotion analysis model has been implemented to categorize emotions. The model is a multiclass and multilabel classification problem. However, few studies have been adapted for this emotion classification problem in Arabic social media. Nearly the only work is the one of SemEval 2018 task1- sub-task E-c. Several machine learning approaches have been implemented in this task; a few studies were based on deep learning. Our model implemented a novel multilayer bidirectional long short term memory (BiLSTM) trained on top of pre-trained word embedding vectors. The model achieved state-of-the-art performance enhancement. This approach has been compared with other models developed in the same tasks using Support Vector Machines (SVM), random forest (RF), and fully connected neural networks. The proposed model achieved a performance improvement over the best results obtained for this task.
Dimensionality problem is a well-known challenging issue for most classifiers in which datasets have unbalanced number of samples and features. Features may contain unreliable data which may lead the classification process to produce undesirable results. Feature selection approach is considered a solution for this kind of problems. In this paperan enhanced firefly algorithm is proposed to serve as a feature selection solution for reducing dimensionality and picking the most informative features to be used in classification. The main purpose of the proposedmodel is to improve the classification accuracy through using the selected features produced from the model, thus classification errors will decrease. Modeling firefly in this research appears through simulating firefly position by cell chi-square value which is changed after every move, and simulating firefly intensity by calculating a set of different fitness functionsas a weight for each feature. K-nearest neighbor and Discriminant analysis are used as classifiers to test the proposed firefly algorithm in selecting features. Experimental results showed that the proposed enhanced algorithmbased on firefly algorithm with chi-square and different fitness functions can provide better results than others. Results showed that reduction of dataset is useful for gaining higher accuracy in classification.
Expressing our emotions using text and emojis expressions became widespread through social media such as Facebook, Instagram, Twitter, Weibo, and LinkedIn. Nowadays, both organizations and individuals are interested in using social media to analyze people's opinions and extract sentiments and emotions. We proposed a model for multilabel emotion classification, using a bidirectional Long Short-term Memory BiLSTM deep network. It is evaluated on the Arabic tweets' dataset provided by SemEval 2018 for the E-c task. Several preprocessing steps, including ARLSTEM with some modifications, replacing emojis with corresponding text meaning from a manually built lexicon, and feature vector representation using Aravec word embedding is applied. The novelty in our research that it examines the effect of hyperparameter tuning on model performance, and it uses BiLSTM in all of its deep neural network layers. The proposed model achieves a comparable performance with state-of-the-art models using different machine learning and deep learning techniques. The system achieves about 9% enhancement in validation accuracy compared with the last best model in the same task using Support Vector classifier SVC; it outperforms the other deep neural networks (UNCCTeam) based on fully connected layers in micro F1 metric of about 4.4%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.