Motif is an over-represented pattern in biological sequence. Motif discovery is a major challenge in bioinformatics. Pattern mismatches phenomena makes motif mining very difficult. Brute Force approaches take exponential time with motif length to solve this problem. In this paper, the authors discuss a Recursive-Brute Force algorithm. Its average case time complexity is exponential with the allowed mutations instead of the motif length. Modern Multi-Core architecture revolution encourages us to parallelize our algorithm. We implement the algorithm using two different approaches. A multi-threaded version (OMP-RBF) is implemented using OpenMP. OMP-RBF suffers from a serious performance degradation due to the heap contention problem. The authors have investigated different solutions to solve the heap contention problem. The second implementation is based on MPI that is called MPI-RBF. The efficient handling of the data locality boost the scalability of the MPI-RBF. The authors prove that MPI approach outperforms OpenMP in such computationally-intensive, memory-intensive, and communication-less problem.
Background In the current genomic era, gene expression datasets have become one of the main tools utilized in cancer classification. Both curse of dimensionality and class imbalance problems are inherent characteristics of these datasets. These characteristics have a negative impact on the performance of most classifiers when used to classify cancer using genomic datasets. Results This paper introduces Reduced Noise-Autoencoder (RN-Autoencoder) for pre-processing imbalanced genomic datasets for precise cancer classification. Firstly, RN-Autoencoder solves the curse of dimensionality problem by utilizing the autoencoder for feature reduction and hence generating new extracted data with lower dimensionality. In the next stage, RN-Autoencoder introduces the extracted data to the well-known Reduced Noise-Synthesis Minority Over Sampling Technique (RN- SMOTE) that efficiently solve the problem of class imbalance in the extracted data. RN-Autoencoder has been evaluated using different classifiers and various imbalanced datasets with different imbalance ratios. The results proved that the performance of the classifiers has been improved with RN-Autoencoder and outperformed the performance with original data and extracted data with percentages based on the classifier, dataset and evaluation metric. Also, the performance of RN-Autoencoder has been compared to the performance of the current state of the art and resulted in an increase up to 18.017, 19.183, 18.58 and 8.87% in terms of test accuracy using colon, leukemia, Diffuse Large B-Cell Lymphoma (DLBCL) and Wisconsin Diagnostic Breast Cancer (WDBC) datasets respectively. Conclusion RN-Autoencoder is a model for cancer classification using imbalanced gene expression datasets. It utilizes the autoencoder to reduce the high dimensionality of the gene expression datasets and then handles the class imbalance using RN-SMOTE. RN-Autoencoder has been evaluated using many different classifiers and many different imbalanced datasets. The performance of many classifiers has improved and some have succeeded in classifying cancer with 100% performance in terms of all used metrics. In addition, RN-Autoencoder outperformed many recent works using the same datasets.
Background Early diagnosis of Pancreatic Ductal Adenocarcinoma (PDAC) is the main key to surviving cancer patients. Urine proteomic biomarkers which are creatinine, LYVE1, REG1B, and TFF1 present a promising non-invasive and inexpensive diagnostic method of the PDAC. Recent utilization of both microfluidics technology and artificial intelligence techniques enables accurate detection and analysis of these biomarkers. This paper proposes a new deep-learning model to identify urine biomarkers for the automated diagnosis of pancreatic cancers. The proposed model is composed of one-dimensional convolutional neural networks (1D-CNNs) and long short-term memory (LSTM). It can categorize patients into healthy pancreas, benign hepatobiliary disease, and PDAC cases automatically. Results Experiments and evaluations have been successfully done on a public dataset of 590 urine samples of three classes, which are 183 healthy pancreas samples, 208 benign hepatobiliary disease samples, and 199 PDAC samples. The results demonstrated that our proposed 1-D CNN + LSTM model achieved the best accuracy score of 97% and the area under curve (AUC) of 98% versus the state-of-the-art models to diagnose pancreatic cancers using urine biomarkers. Conclusion A new efficient 1D CNN-LSTM model has been successfully developed for early PDAC diagnosis using four proteomic urine biomarkers of creatinine, LYVE1, REG1B, and TFF1. This developed model showed superior performance on other machine learning classifiers in previous studies. The main prospect of this study is the laboratory realization of our proposed deep classifier on urinary biomarker panels for assisting diagnostic procedures of pancreatic cancer patients.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.