MicroRNA (miRNA) plays an important role as a regulator in biological processes. Identification of (pre-) miRNAs helps in understanding regulatory processes. Machine learning methods have been designed for pre-miRNA identification. However, most of them cannot provide reliable predictive performances on independent testing data sets. We assumed this is because the training sets, especially the negative training sets, are not sufficiently representative. To generate a representative negative set, we proposed a novel negative sample selection technique, and successfully collected negative samples with improved quality. Two recent classifiers rebuilt with the proposed negative set achieved an improvement of ~6 percent in their predictive performance, which confirmed this assumption. Based on the proposed negative set, we constructed a training set, and developed an online system called miRNApre specifically for human pre-miRNA identification. We showed that miRNApre achieved accuracies on updated human and non-human data sets that were 34.3 and 7.6 percent higher than those achieved by current methods. The results suggest that miRNApre is an effective tool for pre-miRNA identification. Additionally, by integrating miRNApre, we developed a miRNA mining tool, mirnaDetect, which can be applied to find potential miRNAs in genome-scale data. MirnaDetect achieved a comparable mining performance on human chromosome 19 data as other existing methods.
Information of protein 3-dimensional (3D) structures plays an essential role in molecular biology, cell biology, biomedicine, and drug design. Protein fold prediction is considered as an immediate step for deciphering the protein 3D structures. Therefore, protein fold prediction is one of fundamental problems in structural bioinformatics. Recently, numerous taxonomic methods have been developed for protein fold prediction. Unfortunately, the overall prediction accuracies achieved by existing taxonomic methods are not satisfactory although much progress has been made. To address this problem, we propose a novel taxonomic method, called PFPA, which is featured by combining a novel feature set through an ensemble classifier. Particularly, the sequential evolution information from the profiles of PSI-BLAST and the local and global secondary structure information from the profiles of PSI-PRED are combined to construct a comprehensive feature set. Experimental results demonstrate that PFPA outperforms the state-of-the-art predictors. To be specific, when tested on the independent testing set of a benchmark dataset, PFPA achieves an overall accuracy of 73.6%, which is the leading accuracy ever reported. Moreover, PFPA performs well without significant performance degradation on three updated large-scale datasets, indicating the robustness and generalization of PFPA. Currently, a webserver that implements PFPA is freely available on http://121.192.180.204:8080/PFPA/Index.html.
Protein structural classes information is beneficial for secondary and tertiary structure prediction, protein folds prediction, and protein function analysis. Thus, predicting protein structural classes is of vital importance. In recent years, several computational methods have been developed for low-sequence-similarity (25%-40%) protein structural classes prediction. However, the reported prediction accuracies are actually not satisfactory. Aiming to further improve the prediction accuracies, we propose three different feature extraction methods and construct a comprehensive feature set that captures both sequence and structure information. By applying a random forest (RF) classifier to the feature set, we further develop a novel method for structural classes prediction. We test the proposed method on three benchmark datasets (25PDB, 640, and 1189) with low sequence similarity, and obtain the overall prediction accuracies of 93.5%, 92.6%, and 93.4%, respectively. Compared with six competing methods, the accuracies we achieved are 3.4%, 6.2%, and 8.7% higher than those achieved by the best-performing methods, showing the superiority of our method. Moreover, due to the limitation of the size of the three benchmark datasets, we further test the proposed method on three updated large-scale datasets with different sequence similarities (40%, 30%, and 25%). The proposed method achieves above 90% accuracies for all the three datasets, consistent with the accuracies on the above three benchmark datasets. Experimental results suggest our method as an effective and promising tool for structural classes prediction. Currently, a webserver that implements the proposed method is available on http://121.192.180.204:8080/RF_PSCP/Index.html.
Most essential functions are associated with various protein-protein interactions, particularly the cytokine-receptor interaction. Knowledge of the heterogeneous network of cytokine- receptor interactions provides insights into various human physiological functions. However, only a few studies are focused on the computational prediction of these interactions. In this study, we propose a novel machine-learning-based method for predicting cytokine-receptor interactions. A protein sequence is first transformed by incorporating the sequence evolutional information and then formulated with the following three aspects: (1) the k-skip-n-gram model, (2) physicochemical properties, and (3) local pseudo position-specific score matrix (local PsePSSM). The random forest classifier is subsequently employed to predict potential cytokine-receptor interactions. Experimental results on a dataset of Homo sapiens show that the proposed method exhibits improved performance, with 3.4% higher overall prediction accuracy, than existing methods.
In the current open society and with the growth of human rights, people are more and more concerned about the privacy of their information and other important data. This study makes use of electrocardiography (ECG) data in order to protect individual information. An ECG signal can not only be used to analyze disease, but also to provide crucial biometric information for identification and authentication. In this study, we propose a new idea of integrating electrocardiogram watermarking and compression approach, which has never been researched before. ECG watermarking can ensure the confidentiality and reliability of a user's data while reducing the amount of data. In the evaluation, we apply the embedding capacity, bit error rate (BER), signal-to-noise ratio (SNR), compression ratio (CR), and compressed-signal to noise ratio (CNR) methods to assess the proposed algorithm. After comprehensive evaluation the final results show that our algorithm is robust and feasible.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.