Gene expressions are considered among the most used features in cancer classification. The available gene expression data has a small number of samples and a relatively big number of dimensions, and that makes it not suitable for deep Convolutional Neural Networks (CNN) architectures, which exhibit state-of-the-art performance in many fields. In this paper, we propose a lightweight CNN architecture for breast cancer classification using gene expression data downloaded from Pan-Cancer Atlas using ''Illumina HiSeq'' platform. The downloaded gene expression data is preprocessed and then transformed into 2Dimages. We started the preprocessing by removing the outlier samples, which are determined based on the Array-Array Intensity Correlation (AAIC), which defines a symmetric square matrix of Spearman correlation. Then we applied a normalization process on the gene expression data to ensure that we can infer the expression level from it correctly and avoid biases in the expression measures. Finally, filtering is applied on the data. Model selection or a parameters search strategy is conducted to choose the values of the CNN hyper-parameters that give optimal performance. Our experiments show that our proposed method achieves an accuracy of 98.76%, which is the highest compared to other competing methods.INDEX TERMS Tumor type classification, RNA-Seq, gene expression, convolutional neural network, edge detection.MURTADA K. ELBASHIR received the B.Sc. degree (Hons.) in computer/statistics from the University of Gezira, Wad Madani, Sudan, in 2000, the M.Sc. degree (Hons.) in computer information systems from the University of the Free State, Bloemfontein, South Africa, in 2003, and the Ph.D. degree in computer science and technology
Hand-written text recognition is useful for interpreting records in different fields such as healthcare, surgery and police in which professionals may avoid technical equipment and prefer writing notes on paper. In order to perform data fusion from different data sources, handwriting automatic recognition involves barriers such as different ways of writing letters and deformation due to many reasons. This work presents a novel handwriting recognition approach based on the application of coordinate vectors to find similarities in different kinds of deformations. In particular, it has been implemented using 16 segments in order to distinguish all the particularities in matching the new text considering a dataset with a machine-learning approach. The implementation of this approach with MATLAB shows promising results with accuracy of 92.8% for with ensemble and bagged trees, after analyzing 22 possible combinations of machine learning and processing techniques.
Arabic writing concerns many different symbols, whose composition is much more complex for recognition algorithms than European languages based mostly on English letters with some few exceptions of some letters with different written accents. Some works of the literature have addressed Arabic writing recognition in a similar way as in other languages, but some symbols and Arabic writing rules are still not properly addressed. This can be observed either in the limitations of the datasets used for the experiments or in the low accuracy results in certain sentences containing these particularities. Automatically recognizing Arabic writing is still challenging. This work has explored different preprocessing approaches considering Arabic particularities combined with several machine-learning techniques. Our approach has selected 128 features and has normalized them accordingly evaluating the effectiveness in Arabic writing recognition. This work proposes a support vector machine (SVM) approach on KHATT dataset as a benchmark for determining the capacity of algorithms in identifying Arabic writing. This proposal was based on the presented experimentation of 22 different combinations of machine learning techniques and preprocessing approaches. The proposed combination obtained 87.1% of accuracy, which was the highest one among all the analyzed combinations. This work presents the separate accuracies for detecting 34 symbol sequences that are usually the most difficult to be identified in handwriting. This article also shows the confusion matrix among all the pairs of these symbol sequences, discussing the most relevant results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.