Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism.
For the last two decades, oversampling has been employed to overcome the challenge of learning from imbalanced datasets. Many approaches to solving this challenge have been offered in the literature. Oversampling, on the other hand, is a concern. That is, models trained on fictitious data may fail spectacularly when put to real-world problems. The fundamental difficulty with oversampling approaches is that, given a real-life population, the synthesized samples may not truly belong to the minority class. As a result, training a classifier on these samples while pretending they represent minority may result in incorrect predictions when the model is used in the real world. We analyzed a large number of oversampling methods in this paper and devised a new oversampling evaluation system based on hiding a number of majority examples and comparing them to those generated by the oversampling process. Based on our evaluation system, we ranked all these methods based on their incorrectly generated examples for comparison. Our experiments using more than 70 oversampling methods and nine imbalanced real-world datasets reveal that all oversampling methods studied generate minority samples that are most likely to be majority. Given data and methods in hand, we argue that oversampling in its current forms and methodologies is unreliable for learning from class imbalanced data and should be avoided in real-world applications.
Heart disease is one of the key contributors to human death. Each year, several people die due to this disease. According to the WHO, 17.9 million people die each year due to heart disease. With the various technologies and techniques developed for heart-disease detection, the use of image classification can further improve the results. Image classification is a significant matter of concern in modern times. It is one of the most basic jobs in pattern identification and computer vision, and refers to assigning one or more labels to images. Pattern identification from images has become easier by using machine learning, and deep learning has rendered it more precise than traditional image classification methods. This study aims to use a deep-learning approach using image classification for heart-disease detection. A deep convolutional neural network (DCNN) is currently the most popular classification technique for image recognition. The proposed model is evaluated on the public UCI heart-disease dataset comprising 1050 patients and 14 attributes. By gathering a set of directly obtainable features from the heart-disease dataset, we considered this feature vector to be input for a DCNN to discriminate whether an instance belongs to a healthy or cardiac disease class. To assess the performance of the proposed method, different performance metrics, namely, accuracy, precision, recall, and the F1 measure, were employed, and our model achieved validation accuracy of 91.7%. The experimental results indicate the effectiveness of the proposed approach in a real-world environment.
Nowadays, customer churn has been reflected as one of the main concerns in the processes of the telecom sector, as it affects the revenue directly. Telecom companies are looking to design novel methods to identify the potential customer to churn. Hence, it requires suitable systems to overcome the growing churn challenge. Recently, integrating different clustering and classification models to develop hybrid learners (ensembles) has gained wide acceptance. Ensembles are getting better approval in the domain of big data since they have supposedly achieved excellent predictions as compared to single classifiers. Therefore, in this study, we propose a customer churn prediction (CCP) based on ensemble system fully incorporating clustering and classification learning techniques. The proposed churn prediction model uses an ensemble of clustering and classification algorithms to improve CCP model performance. Initially, few clustering algorithms such as k-means, k-medoids, and Random are employed to test churn prediction datasets. Next, to enhance the results hybridization technique is applied using different ensemble algorithms to evaluate the performance of the proposed system. Above mentioned clustering algorithms integrated with different classifiers including Gradient Boosted Tree (GBT), Decision Tree (DT), Random Forest (RF), Deep Learning (DL), and Naive Bayes (NB) are evaluated on two standard telecom datasets which were acquired from Orange and Cell2Cell. The experimental result reveals that compared to the bagging ensemble technique, the stacking-based hybrid model (k-medoids-GBT-DT-DL) achieve the top accuracies of 96%, and 93.6% on the Orange and Cell2Cell dataset, respectively. The proposed method outperforms conventional state-of-the-art churn prediction algorithms.
Leukemia is a form of blood cancer that develops when the human body’s bone marrow contains too many white blood cells. This medical condition affects adults and is considered a prevalent form of cancer in children. Treatment for leukaemia is determined by the type and the extent to which cancer has developed across the body. It is crucial to diagnose leukaemia early in order to provide adequate care and to cure patients. Researchers have been working on advanced diagnostics systems based on Machine Learning (ML) approaches to diagnose leukaemia early. In this research, we employ deep learning (DL) based convolutional neural network (CNN) and hybridized two individual blocks of CNN named CNN-1 and CNN-2 to detect acute lymphoblastic leukaemia (ALL), acute myeloid leukaemia (AML), and multiple myeloma (MM). The proposed model detects malignant leukaemia cells using microscopic blood smear images. We construct a dataset of about 4150 images from a public directory. The main challenges were background removal, ripping out un-essential blood components of blood supplies, reduce the noise and blurriness and minimal method for image segmentation. To accomplish the pre-processing and segmentation, we transform RGB color-space into the greyscale 8-bit mode, enhancing the contrast of images using the image intensity adjustment method and adaptive histogram equalisation (AHE) method. We increase the structure and sharpness of images by multiplication of binary image with the output of enhanced images. In the next step, complement is done to get the background in black colour and nucleus of blood in white colour. Thereafter, we applied area operation and closing operation to remove background noise. Finally, we multiply the final output to source image to regenerate the images dataset in RGB colour space, and we resize dataset images to [400, 400]. After applying all methods and techniques, we have managed to get noiseless, non-blurred, sharped and segmented images of the lesion. In next step, enhanced segmented images are given as input to CNNs. Two parallel CCN models are trained, which extract deep features. The extracted features are further combined using the Canonical Correlation Analysis (CCA) fusion method to get more prominent features. We used five classification algorithms, namely, SVM, Bagging ensemble, total boosts, RUSBoost, and fine KNN, to evaluate the performance of feature extraction algorithms. Among the classification algorithms, Bagging ensemble outperformed the other algorithms by achieving the highest accuracy of 97.04%.
Cancer is a deadly disease that arises due to the growth of uncontrollable body cells. Every year, a large number of people succumb to cancer and it's been labeled as the most serious public health snag. Cancer can develop in any part of the human anatomy, which may consist of trillions of cellules. One of the most frequent type of cancer is skin cancer which develops in the upper layer of the skin. Previously, machine learning techniques have been used for skin cancer detection using protein sequences and different kinds of imaging modalities. The drawback of the machine learning approaches is that they require humanengineered features, which is a very laborious and time-taking activity. Deep learning addressed this issue to some extent by providing the facility of automatic feature extraction. In this study, convolution-based deep neural networks have been used for skin cancer detection using ISIC public dataset. Cancer detection is a sensitive issue, which is prone to errors if not timely and accurately detected. The performance of the individual machine learning models to detect cancer is limited. The combined decision of individual learners is expected to be more accurate than the individual learners. The ensemble learning technique exploits the diversity of learners to yield a better decision. Thus, the prediction accuracy can be enhanced by combing the decision of individual learners for sensitive issues such as cancer detection. In this paper, an ensemble of deep learners has been developed using learners of VGG, CapsNet, and ResNet for skin cancer detection. The results show that the combined decision of deep learners is superior to the finding of individual learners in terms of sensitivity, accuracy, specificity, F-score, and precision. The experimental results of this study provide a compelling reason to be applied for other disease detection.
In December 2019, the novel coronavirus disease 2019 (COVID-19) appeared. Being highly contagious and with no effective treatment available, the only solution was to detect and isolate infected patients to further break the chain of infection. The shortage of test kits and other drawbacks of lab tests motivated researchers to build an automated diagnosis system using chest X-rays and CT scanning. The reviewed works in this study use AI coupled with the radiological image processing of raw chest X-rays and CT images to train various CNN models. They use transfer learning and numerous types of binary and multi-class classifications. The models are trained and validated on several datasets, the attributes of which are also discussed. The obtained results of various algorithms are later compared using performance metrics such as accuracy, F1 score, and AUC. Major challenges faced in this research domain are the limited availability of COVID image data and the high accuracy of the prediction of the severity of patients using deep learning compared to well-known methods of COVID-19 detection such as PCR tests. These automated detection systems using CXR technology are reliable enough to help radiologists in the initial screening and in the immediate diagnosis of infected individuals. They are preferred because of their low cost, availability, and fast results.
Brain tumors affect the normal functioning of the brain and if not treated in time these cancerous cells may affect the other tissues, blood vessels, and nerves surrounding these cells. Today, a large population worldwide is affected by the precarious disease of the brain tumor. Healthy tissues of the brain are suspected to be damaged because of tumors that become the most significant reason for a large number of deaths nowadays. Therefore, their early detection is necessary to prevent patients from unfortunate mishaps resulting in loss of lives. The manual detection of brain tumors is a challenging task due to discrepancies in appearance in terms of shape, size, nucleus, etc. As a result, an automatic system is required for the early detection of brain tumors. In this paper, the detection of tumors in brain cells is carried out using a deep convolutional neural network with stochastic gradient descent (SGD) optimization algorithm. The multi-classification of brain tumors is performed using the ResNet-50 model and evaluated on the public Kaggle brain-tumor dataset. The method achieved 99.82% and 99.5% training and testing accuracy, respectively. The experimental result indicates that the proposed model outperformed baseline methods, and provides a compelling reason to be applied to other diseases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.