Background: MicroRNAs (miRNAs) are a class of endogenous non-coding RNAs with about 22 nucleotides and they play a significant role in a variety of complex biological processes. Many researches have shown that miRNAs are closely related to human diseases. Although the biological experiments are reliable in identifying miRNA-disease associations, they are time-consuming and costly. Objective: Thus, computational methods are urgently needed to effectively predict miRNA-disease associations. Method: In this paper, we proposed a novel method, BIRWMDA based on a bi-random walk model to predict miRNAdisease associations. Specifically, in BIRWMDA, the similarity network fusion algorithm is used to combine the multiple similarity matrices to obtain a miRNA-miRNA similarity matrix and a disease-disease similarity matrix, then the miRNAdisease associations were predicted by the bi-random walk model. Results: To evaluate the performance of BIRWMDA, we ran the leave-one-out cross validation and 5-fold cross validation, and their corresponding AUCs were 0.9303 and 0.9223 ± 0.00067, respectively. To further demonstrate the effectiveness of the BIRWMDA, from the perspective of exploring disease-related miRNAs, we conducted three case studies of breast neoplasms, prostate neoplasms and gastric neoplasms, where 48, 50 and 50 out of the top 50 predicted miRNAs were confirmed by literatures, respectively. From the perspective of exploring miRNA-related diseases, we conducted two case studies of hsa-mir-21 and hsa-mir-155, where 7 and 5 out of the top 10 predicted diseases were confirmed by literatures, respectively. Conclusion: Fusion of multiple biological networks could effectively predict miRNA-diseases associations. We expected BIRWMDA to severe as a biological tool for mining potential miRNA-disease associations.
Determining the subcellular localization of long non-coding RNAs (lncRNAs) provides very favorable references to discover the function of lncRNAs. Instead of through time-consuming and expensive biochemical experiments, we develop a machine learning predictor based on logistic regression, lncLocPred, to predict the subcellular localization of lncRNAs. We adopt sequence features including k-mer, triplet, and PseDNC and systematically process feature selection through VarianceThreshold, binomial distribution, and F-score to obtain representative features. We observe that the top-ranked k-mers have a higher base content of G and C in the form of short repeats. Improving prediction accuracy on several subcellular localizations, our model achieves the highest overall accuracy of 92.37% on the benchmark dataset by jackknife, higher than the existing state-of-the-art predictors. Additionally, lncLocPred performs better for the prediction on an independent dataset collected by us as well. Related experimental data and source code are available at https://github.com/jademyC1221/lncLocPred.
LncRNA and miRNA are two non-coding RNA types that are popular in current research.
The health condition monitoring of rotating machinery can avoid the disastrous failure and guarantee the safe operation. The vibration-based fault diagnosis shows the most attractive character for fault diagnosis of rotating machinery (FDRM). Recently, Lempel-Ziv complexity (LZC) has been investigated as an effective tool for FDRM. However, the LZC only performs single-scale analysis, which is not suitable to extract the fault features embedded in vibrational signal over multiple scales. In this paper, a novel complexity analysis algorithm, called hierarchical Lempel-Ziv complexity (HLZC), was developed to extract the fault characteristics of rotating machinery. The proposed HLZC method considers the fault information hidden in both low-frequency and high-frequency components, resulting in a more accurate fault feature extraction. The superiority of the proposed HLZC method in detecting the periodical impulses was validated by using simulated signals. Meanwhile, two experimental signals were utilized to prove the effectiveness of the proposed HLZC method in extracting fault information. Results show that the proposed HLZC method had the best diagnosing performance compared with LZC and multi-scale Lempel-Ziv complexity methods.
Plenty of microbes in our human body play a vital role in the process of cell physiology. In recent years, there is accumulating evidence indicating that microbes are closely related to many complex human diseases. In-depth investigation of disease-associated microbes can contribute to understanding the pathogenesis of diseases and thus provide novel strategies for the treatment, diagnosis, and prevention of diseases. To date, many computational models have been proposed for predicting microbe-disease associations using available similarity networks. However, these similarity networks are not effectively fused. In this study, we proposed a novel computational model based on multi-data integration and network consistency projection for Human Microbe-Disease Associations Prediction (HMDA-Pred), which fuses multiple similarity networks by a linear network fusion method. HMDA-Pred yielded AUC values of 0.9589 and 0.9361 ± 0.0037 in the experiments of leave-one-out cross validation (LOOCV) and 5-fold cross validation (5-fold CV), respectively. Furthermore, in case studies, 10, 8, and 10 out of the top 10 predicted microbes of asthma, colon cancer, and inflammatory bowel disease were confirmed by the literatures, respectively.
Terminator is a DNA sequence that gives the RNA polymerase the transcriptional termination signal. Identifying terminators correctly can optimize the genome annotation, more importantly, it has considerable application value in disease diagnosis and therapies. However, accurate prediction methods are deficient and in urgent need. Therefore, we proposed a prediction method "iterb-PPse" for terminators by incorporating 47 nucleotide properties into PseKNC-I and PseKNC-II and utilizing Extreme Gradient Boosting to predict terminators based on Escherichia coli and Bacillus subtilis. Combing with the preceding methods, we employed three new feature extraction methods K-pwm, Base-content, Nucleotidepro to formulate raw samples. The two-step method was applied to select features. When identifying terminators based on optimized features, we compared five single models as well as 16 ensemble models. As a result, the accuracy of our method on benchmark dataset achieved 99.88%, higher than the existing state-of-the-art predictor iTerm-PseKNC in 100 times fivefold cross-validation test. Its prediction accuracy for two independent datasets reached 94.24% and 99.45% respectively. For the convenience of users, we developed a software on the basis of "iterb-PPse" with the same name. The open software and source code of "iterb-PPse" are available at https://github.com/Sarahyouzi/iterb-PPse.
23Terminator is a DNA sequence that give the RNA polymerase the transcriptional 24 termination signal. Identifying terminators correctly can optimize the genome 25 annotation, more importantly, it has considerable application value in disease diagnosis 26 and therapies. However, accurate prediction methods are deficient and in urgent need.27 Therefore, we proposed a prediction method "iterb-PPse" for terminators by 28 incorporating 47 nucleotide properties into PseKNC-Ⅰ and PseKNC-Ⅱ and utilizing 29 Extreme Gradient Boosting to predict terminators based on Escherichia coli and 30 Bacillus subtilis. Combing with the preceding methods, we employed three new feature 31 extraction methods K-pwm, Base-content, Nucleotidepro to formulate raw samples. 32 The two-step method was applied to select features. When identifying terminators 33 based on optimized features, we compared five single models as well as 16 ensemble 34 models. As a result, the accuracy of our method on benchmark dataset achieved 35 99.88%, higher than the existing state-of-the-art predictor iTerm-PseKNC in 100 times 36 five-fold cross-validation test. It's prediction accuracy for two independent datasets 37 reached 94.24% and 99.45% respectively. For the convenience of users, a software was 38 developed with the same name on the basis of "iterb-PPse". The open software and 39 source code of "iterb-PPse" are available at https://github.com/Sarahyouzi/iterb-PPse. 3 40 1 Introduction 41 DNA transcription is an important step in the inheritance of genetic information 42 and terminators control the termination of transcription which exists in sequences that 43 have been transcribed. When transcription, the terminator will give the RNA 44 polymerase the transcriptional termination signal. Identifying terminators accurately 45 can optimize the genome annotation, more importantly, it has great application value 46 in disease diagnosis and therapies, so it is crucial to identify terminators. Whereas, 47 using traditional biological experiments to identify terminators is extremely time 48 consuming and labor intensive. Therefore, a more effective and convenient began to be 49 applied in researches, that is, adopting machine learning to identify gene sequences. 50 Previous research found there are two types of terminators in prokaryotes, namely 51 Rho-dependent and Rho-independent[1], as shown in Fig 1. Although there have been 52 a lot of studies on the prediction of terminators, most of them only focused on one kind 53 of them. In 2004, Wan XF, Xu D et al. proposed a prediction method for Rho-54 independent terminators with an accuracy of 92.25%. In 2005, Michiel J. L. de Hoon 55 et al. studied the sequence of Rho-independent terminators in B. subtilis[2], and the 56 final prediction accuracy was 94%. In 2011, Magali Naville et al. conducted a research 57 on Rho-dependent transcriptional terminators[3]. They used two published algorithms, 58 Erpin and RNA motif, to predict terminators. The specificity and sensitivity of the final 59 results were 95.3% and 87.8...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.